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ABSTRACT 

The dynamic allocation of limited processor and main memory 
resources among the members of a user community is investigated 
as a supply-and-demand problem. The work is divided into four 
phases. 

The first phase is the construction of the working set model 
for program behavior. This model is based on locality, the con- 
cept that, during any interval of execution, a program favors a 
subset of its information; a computation's working set is a dyn- 
amic measure of this set of favored information. A working set 
storage management policy is one that allocates processors to 
a computation if and only if there is enough uncommitted space 
in main memory to contain its working set. Under such a policy, 
a computation acquires and releases storage as needed, indepen- 
dently of other computations ; because computations are thus made 
statistically independent, it is possible to derive many detailed 
properties of such policies, both in shared and unshared situations, 

The second phase is to define and study the properties of 
system demand. A computation is regarded as the basic demand- 
making entity, placing demands jointly on processor and main mem- 
ory resources. Its system demand is a pair (processor demand, 
memory demand), where its processor demand represents its immedi- 
ate processor requirement (intensity and duration), and its mem- 
ory demand represents its immediate main memory requirement (its 
working set size). 

The third phase is to define and study the properties of 
system balance. Computations that demand resources are segre- 
gated into two classes: the first class, called the standby set, 
is temporarily denied the use of system resources; the second 
class, called the balance set, is granted the use of system re- 
sources. The system is balanced when the total system demand 
of the balance set matches the system capacity. A balance policy 
is a resource allocation policy that regulates membership in the 
balance set so that balance is maintained. Balance policies are 
formulated as mathematical programming problems whose solutions 
are found dynamically by the scheduler. 

The fourth phase is to apply all these ideas to the design 
and administration of multiprocess computer systems. A relation 
describing the equipment configuration is derived; suggestions 
for processor and multilevel memory system design are made. Per- 
formance measures are discussed. 

This work is intended to be a new approach to modelling the 
behavior of ongoing computations. It is intended to be a general, 
unified philosophy about allocation and sharing. It is intended 
to spark new thinking about the design and administration of 
multiprocess computer systems. 
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NOTATION 

To prevent confusion, we list here the major notational 
conventions we have used in this thesis. 

Notation Explanation 

x e a x is an element of the set A. 

A = {x|p} A comprises all elements x 

having the property P. 



A 



the number of elements in A. 



A Q b the set A is contained in the 

set B; i.e., every element of 
A is also an element of B. 

B = I Ia. = (xlxeA., some iel} definition of the union of sets. 

iel 

B = Ha. = jxIxeA., all iel} definition of the intersection 

■ ' x l ' 1 J of sets, 
iel 

Pr[A] probability of the event A. 

Pr[A|B] probability of the event A, 

conditioned on the occurrence 
of the event B. 

F (u) = Pr[x<u] probability distribution func- 

x tion for the random variable x. 

f ( u ) = <L- F ( u ) probability density function 

x du x for the ranc j om variable x. 

x = f u f (u) du the mean, or expectation, of 

x the random variable x. 

x 2 = f u 2 f ( u ) du the second moment of the random 

x variable x. 

cj 2 = x 2 _ x 2 the variance of the random 

x variable x. 



g(x) = /g(u) f (u) du expectation of the function 

v g(x) of the random variable x. 



x 



g(x,y) = / g(u,y) f (u) du expectation with respect to x 

x of the function g(x,y) of two 

random variables x and y. 
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CHAPTER 1 



The Resource Allocation Problem 



1.0. Introduction to the Resource Allocation Problem 

The desire for a general purpose community computing fac- 
ility — a computer utility — has motivated recent trends in 
computer design. Just as electric power is distributed to the 
members of a community to satisfy their electromechanical needs, 
so-called computing power can be distributed to the members of 
a community to satisfy their information processing needs. 

The essence of a computer utility can be captured in one 
word: sharing . By sharing computing resources, the users dis- 
tribute the costs, and each pays less. By sharing information, 
one user may build on the work of others, and advance more ra- 
pidly in his own work. Sharing benefits the system, too, for 
the system may select from a wide range of instantaneous demands 
those that tend to improve its efficiency. Resource allocation 
is the problem of distributing limited resources among members 
of the community. 

In recent years we all have watched the evolution of soph- 
isticated techniques for sharing of equipment and information, 
techniques such as multiprogramming, multiprocessing, multi- 



accessing [C7,C8,D7, P2] , segmentation and paging [D8] , and 
traffic control [S2]. Computer systems using these techniques 
have not always met expectations. For example, it has been ob- 
served that the efficiency of paged memory systems has often 
been much less than anticipated. There have even been instances 
of unexpected behavior. For example, it has been observed that 
it is possible to be processing a set of programs using all the 
available memory and processor resources; introducing an additional, 
average-sized program into memory can trigger a total collapse 
of service efficiency, leaving almost all the processors idle. 
This phenomenon, known as thrashing , at first defies our intuition, 
which would instead lead us to expect gradual degradation of ser- 
vice as additional programs are squeezed into memory. 

What causes thrashing? When multiprogramming a memory, what 
is the smallest subset of each program that ought to reside there? 
What (perhaps unwanted) interactions take place among programs 
that compete for the same equipment? Given a set of programs, 
what should be the configuration of processor and memory resources 
to serve them best? What is the best scheduling policy? The 
best storage management policy? How does one predict the resource 
requirements of a program when nothing is known about it before- 
hand? How can one tell if the system is behaving properly? 

The lack of answers to some of these questions, the intense 
debate over others, and the existence of yet unasked questions, 
lead inescapably to this simple conclusion: we do not understand 
the behavior of ongoing computations. 

Thus, multiprogramming, multiprocessing, and all the other 
techniques, are not solutions to the resource allocation problem; 
they are but tools by which a solution may be implemented. 



It is the purpose of this thesis to start filling the gap, 
to develop new approaches to modelling the behavior of comput- 
ations, to spark a new way of thinking about programs in exe- 
cution, to evolve a general, unified philosophy about resource 
sharing and allocation. 

I felt that an interesting and useful solution to the re- 
source allocation problem should be based on the ideas of supply- 
and-demand economics in a free-enterprise market; and this think- 
ing underlies my work. I wanted to formulate resource alloc- 
ation as the problem of selecting fairly from all user demands 
a subset whose total demand balances the supply; I wanted the 
solution to be applicable across a wide range of computer sys- 
tems, large and small, existing and proposed, from Multics [C8] , 
to Dijkstra's harmonious society of cooperating sequential pro- 
cesses [Dll] , to the highly parallel machines of Dennis [D10] 
and Slotnick [S6] ; I wanted the solution to be unified in the 
sense that processor and memory allocation are handled together, 
not in two separate decisions. To accomplish these goals, I 
approached the problem in four phases. 

The first phase was the construction of an abstract model 
for program behavior. This model, the working set model , makes 
it possible to decide which information is in use by a single 
computation or set of computations; intuitively, a computation's 
working set of information is the smallest collection of infor- 
mation that must be present in main memory for it to operate 
efficiently. The working set model is based on the concept of 
locality , the idea that a computation will, during an interval 
of time, favor a subset of the information available to it; 
the working set is a dynamic measure of this set of favored 



information. A working set memory management policy is one 
that guarantees a computation shall receive the use of proces- 
sors if and only if its working set is present in main memory. 
Under such a policy, computations are made independent, the 
memory acquisitions of one computation being unaffected by those 
of another; thus, unwanted interactions among computations aris- 
ing from competition for memory and processor resources may be 
eliminated. Under such a policy a computation acquires more 
or less memory in accordance with its needs. 

The second phase was to define demand . Observing that a 
computation jointly demands the use of processor and memory re- 
sources, we defined a computation's system demand to be a pair 

(processor demand, memory demand) 

where a processor demand represents the computation's immediate 
processor requirement (intensity and duration), and a memory demand 
demand represents the computation's main memory requirement 
(its working set size). 

The third phase was to investigate system balance . We will say 
that the system is balanced when the sum total of the demands 
of active computations matches the available equipment. This 
set of active computations will be called the balance set . A 
balance policy is a resource allocation policy that regulates 
membership in the balance set so that the balance set, regarded 
as a super-computation, has known characteristics ; its total 
demand is maintained within close tolerance of whatever is re- 
quired to match the equipment. We have been able to formulate 
the problem of deciding which computations are to be members 
of the balance set as a mathematical programming problem, whose 
solution is found dynamically by the scheduler. 



The fourth phase was to apply all these ideas to the design 
and administration of computer systems. A particularly important 
result is: the proper ratio of processor to memory (that is, the 
equipment configuration) that achieves some efficiency level is 
determined not only by the statistics of program size and dur- 
ation, but also by the access time of auxiliary storage devices. 
We are also able to make suggestions about processor design, multi- 
level memory system design, and performance measurements. 



1.1. Plan of the Thesis 

The thesis is organized into four parts. Chapters 1 and 2 
review the concepts with which we want the reader to be familiar; 
Chapters 3, 4, and 5 deal with the working set model; Chapters 6 
and 7 investigate demands, balance, and balance policies; and 
Chapters 8 and 9 look into implications the models have on sys- 
tem design and administration. 

The remainder of the discussion here in Chapter 1 falls into 
two categories: constraints and economics. The most important 
constraint within which we assume a solution to the resource al- 
location problem must function, programming generality , is the 
independence of an algorithm description from the environment in 
which it operates. One of the consequences of this constraint 
is that the computer system must predict, without outside 
assistance, the demands of the computations it executes. A dis- 
cussion of basic supply-and-demand economic theory is included 
to illustrate how pricing policies can be used to regulate demands. 
Chapter 2 reviews basic multiprocess computer system concepts. 

In Chapter 3 we define the working set model for program 
behavior and show that a working set memory management policy 
is the optimum of all policies that must operate without know- 
ledge of future reference patterns made by computations. In 
Chapter 4 the working set model is refined and a great many of 
its properties are derived. The discussion of Chapters 3 and 4 
is restricted to the case in which no information is shared; ac- 
cordingly we examine in Chapter 5 the effects of sharing. We 
show how dramatically sharing can improve efficiency and reduce 
the resource usage costs attributed to a particular user. 



In Chapter 6 we present the formal definitions of demand 
and balance and discuss basic aspects of balance policies. Chap- 
ter 7 is devoted to formulating balance policies as mathematical 
programming problems. Such formulations have dual advantage: first, 
we need not find explicit solutions for the balance policies as 
long as we can convince ourselves that the scheduler is dyn- 
amically finding them; and second, we are assured that the policies 
are optimum since the objective functions are clearly stated. 

Chapter 8 deals with applications to computer system design. 
We derive a relation specifying processor-memory configuration, 
we show that pooling of hardware at a fine level of detail can 
achieve the effect of a large number of processors with a small 
amount of hardware, and we discuss organization and management 
of multilevel memory systems in light of generalized working set 
models . 

Chapter 9 deals with performance measures. Given the models 
and formulation of the solution to the resource allocation prob- 
lem, the performance measures are determined, so Chapter 9 merely 
collects together the major measured discussed in earlier chapters. 

The reader who merely wants to get a detailed overview of 
the major work of this thesis, without having to dig through the 
detailed properties of our models, need only read Chapters 1, 
3, 6, and 8, for that is where the main thread lies. 



1.2.. The Problem and Its Constraints 

We have formulated the problem in the context of a multi- 
process computer system ; we presume that the reader is already 
familiar with mulitprocess computer system objectives, the par- 
ticular details of which may be found in references [C8,F1,P2,V2] . 
Specific implementation concepts will be reviewed in Chapter 2. 
The properties that constrain and complicate the solution to the 
resource allocation problem are discussed below. 

The specific problem toward which this thesis work has been 
directed is : 

To formulate behavior models of computations in multiprocess 
computer systems; then, using the models, formulate a unified 
approach to dynamic allocation of process or -memory resources 
among computations, balancing supply against demand under 
appropriate criteria of fairness. 

We have omitted discussion of input-output allocation for three 
reasons. First, we assume the time rate at which a user inter- 
acts with his computation is relatively very much slower than 
the time rate at which execution proceeds; we are interested pri- 
marily in dynamic resource allocation in the intervals between 
interactions. Second, we feel that the models are general enough 
so that generalization to resource types beyond processor and 
memory will be straightforward. Third, we feel that all the 
rich complexity of the resource allocation problem can be found 
entirely in the processor -memory problem. 

We assume the existence of two kinds of constraints: limited 
equipment and pr oqramminq generality . 

The limited equipment constraints center on the existence 
of only a fixed, finite amount of processor and memory resources. 
There are N identical processors, each of which can deliver 



information references at the rate of one per unit time; since 
the processing rate is bounded, the duration of a program's 
execution enters the problem. We assume the standard unit of 
information storage and transmission is a page , and that the 
capacity of directly-addressable, main memory is M pages. 
Whenever we talk of the .equipment , or the resources , we spec- 
ifically mean the N processors and the M pages of main memory. 
The remaining constraints center on the issue of program- 
ming, generality [Dl'O] , which is the independence of an algor- 
ithm description from the environment in which it operates. 
Programming generality includes 

1. The ability to move a program between installations, 
either manually or automatically (e.g., via computer 
networks ) . 

2. The ability to use a program, without changes, des- 
pite changes to the hardware or to the hardware con- 
figuration. 

3. The ability to use one program in the construction 
of another — to build on the work of others, and to 
share information dynamically. 

This third aspect implies that programs will be modular in con- 
struction (i.e., programs will be segmented [D8] ) . Once compiled, 
a program module should be usable without recompilation as a build- 
ing block of any program whatever. To exhibit programming 
generality, the computer system must permit a program module to: 



10 



1. Create data structures of arbitrary size unknown prior 
to execution. 

2. Call on further procedures unknown to the caller (which 
may call on still other procedures, etc.) 

3. Transmit arbitrarily complex data structures as arguments, 
These last three points, centering on data dependence , imply that 

a module's resource requirements not only will be unknown prior 
to its execution but also will be indeterminable. Thus, the pro- 
gramming generality requirement places these constraints on the 
resource allocation problem: 

1. The computer system, not the programmer or the compiler, 
must decide for itself where in the memory hierarchy 
information is to reside [D4,D10]. 

2. Algorithms must be configuration independent. Infor- 
mation references must be made by means of a location- 
independent addressing mechansim. 

3. Information flows upward in the memory hierarchy only 
on demand, being moved into main memory only when it is 
referenced by a computation. Information flows down- 
ward in the memory hierarchy as it falls out of use. 

4. Arbitrary collections of programs will demand to share 
arbitrary sets of data. Many programs will reside sim- 
ultaneously in main memory (multiprogramming) and many 
processes will be active concurrently (multiprocessing). 

In order to be consistent with programming generality, we 
have assumed the no-advance-information constraint, namely that 
programmers and compilers will, because of data dependence, be 
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unable to make reliable advance estimates about the resource needs 
of their own programs . In addition, any advice that is obtained 
from a programmer cannot necessarily be regarded as useful advice 
even if it may be reliable: a user would intend to optimize the 
environment for his own program — configuring resources to suit 
an individual may interfere with overall good service to the com- 
munity. In order to guard against dishonest users who attempt 
to secure better service by misrepresenting their needs, the sys- 
tem must monitor program behavior, and impose penalties for bad 
estimates. The additional overhead to do this may not be worth 
the cost. 

Since it is not at all clear that advice obtained from pro- 
grammers or compilers can be of any real value, we have chosen 
to formulate a solution to the resource allocation problem in 
the case where there is no advice, where the computer system 
must discover for itself how programs behave. Clearly there will 
be situations in which advice oan be useful, but these are not 
of interest to us here. 

In the interest of programming generality we make the fol- 
lowing distinction between the tools and the methods of resource 
allocation: 

!• The mechanisms , or machinery , of assigning and releasing 
equipment must operate on a low level in that they deal 
directly with the hardware features of the system. Some 
of these tools include multiprogramming, multiprocessing, 
segmentation, paging, interprocess communication, etc. 



There have been attempts to do this. Ramamoorthy [Rl] for example, 
has a proposal for automatic segmentation of programs during 
compilation. 
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2. The policies of resource allocation operate on a higher 
level, in that criteria used to determine when equipment 
is to be allocated to a computation can be machine- 
independent. They are machine-independent inasmuch as 
no detailed knowledge of machine organization is nec- 
essary or even relevant. 
Such a functional separation permits changing policies without 
changing machinery, if the machinery is properly defined. 

Since many of the mechanical aspects of resourde sharing 
already have substantial solutions, we begin investigation at the 
machine-independent level. Resource allocation policies may be 
grouped into classes : 

1. Short-term policies . which must be handled by the com- 
puter system, since decisions must be made in a time 
scale far faster than human response. 

2. Long-term policies , primarily economic, which control 
demands over long periods of time. 

In our work here, short-term policies are concerned with matching 
the demand to the supply, long-term policies with matching the 
supply to the demand. 

The bulk of the thesis is concerned with models that show 
how to define the short-term, balance policies. After a detailed 
discussion in the next section of why balance was chosen as a 
resource allocation goal, we turn attention to a discussion of 
the long-term, economic policies. 
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1.3. Why Balance? 

There were good reasons to choose balance as the objective 
of resource allocation policies, rather than other criteria such 
as maximum equipment utilization or minimum response time. 

The most important reason, already stated, is our desire 
to be consistent with the ideas of supply-and-demand economics. 

The remaining reasons are the results of this thesis. We 
state them here, although many of their justifications will not 
come completely to light until Chapter 7. 

First, it is conceptually simple and mathematically tract- 
able, and it insures a reasonable policy with respect to criteria 
such as maximum equipment utilization or minimum response time. 
Second, we will show that its relative simplicity not only 
makes performance testing and evaluation straightforward but also 
makes clear which parameters are important. Moreover, its rel- 
ative simplicity makes implementation easy. 

Third, we will show that balance exercises control over the 
factors that cause thrashing; recall that thrashing denotes the 
sudden collapse of service efficiency that may occur when too 
many programs are squeezed into main memory. 

Fourth, balance compromises between the conflicting objec- 
tives of fast fair service and low equipment idleness . We il- 
lustrate the dichotomy. Figure 1-la shows a server, before which 
is a queue of demands for its use, the average number in the sys- 
tem being n; we regard n as a measure of the demand for use of the 
server. When there are no demands in the system, the server is 
idle, an event occurring with probability p . The average wait 
in the system is w. Figure 1-lb shows how p varies with n. 
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queue 



(a). Queue and server. 





(b). idlenesc vs. demand. 



(c). Waiting time vs. demand, 




(d). Waiting time vs. idleness. 
Figure 1-1. '"'racecff between waiting time and idleness. 
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Figure 1-lc illustrates that under a fair service discipline 
(i.e., one in which waiting time depends only on order of arrival) 
the expected wait varies linearly with n (see reference [SI], 
p. 42). Figure 1-ld, showing directly the relation between w 
(service) and p (idleness), is constructed by choosing various 
x and finding the corresponding "vT. In general, as p Q decreases, 
w increases: there is an inverse relation between fast fair ser- 
vice and low equipment idleness. As we will see, balance exer- 
cises control over this relation. 

Fifth, we will show that a balance policy can be implemented 
in a relatively load-independent way, the amount of work needed 
to maintain balance depending on the distance of an actual load 
point to a desired load point. 

Sixth, the abstract model of a balanced computer system will 
show the relation between equipment configuration, the auxiliary 
memory access time, and balance. 
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1.4. Supply-and -Demand Economic Principles 

This section surveys aspects of the economic structure under- 
lying our thinking. We consider here a form of supply-and-demand 
computer system economics. 

One motivation for a multiprocess computer system is econo- 
mic: there is a community of users, who individually would be 
unable to afford the full services of a computer system, but who 
collectively can pay the costs. This goal — cheap computing — 
is not attainable solely within a mutliprocess computer system. 
The ability of one user to share and build on the work of others 
is a far more compelling motivation. Yet sharing complicates, 
among other things , the problem of charging users for resource 
consumption, because now the cost of a shared resource must be 
attributed to the participants in accordance with their degrees 
of participation. 

The perhaps overworked term computer utility can be misleading, 
for it is not entirely analogous to the public utilities as we 
know them. Contemporary public utilities are rather large eco- 
nomic systems where the average demand is known to vary slowly; 
for the immediate future, computer utilities will be rather small 
economic systems, subject to fast-changing demand. Public util- 
ities are relatively much larger than computing systems; in a 
computer utility, any user can easily demand every resource. 
Public utilities have physical limits on the quantity of service 
a customer can obtain (a 150 ampere main circuit breaker in his 
house, a 3-inch water main, or 2 telephones), and this need not 
be the case for computer utilities. 

From now on we shall refer to the management personnel of 
the computer system as the administration . It is the responsibility 
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of the administration to properly manage the system, deciding 
who is to use it, how prices are to be set, what additional equip- 
ment is to be purchased, what services are to be offered. Never- 
theless, the burden of managing the system lies mostly on the 
system itself; for example, it must provide automatic metering 
of resource usage and maintain data on demands. The models we 
set up will make it clear what should be metered and how, and 
what demand distributions should be determined and how. 



1.4.1. Demand Curves 

The administration can exercise economic controls over the 
demands of the user community by means of the prices it charges . 

Figure 1-2 shows an elementary demand curve, typifying the 
relation between price per unit resource and the total demand 
from the community. We observe that the higher the price per 
unit resource the less is the total community demand. Point A 
is the intersection between the amount R of resource currently 
provided by the system and the demand curve. If the price is 
less than p. the user community demand will exceed the supply R. 
If the administration wishes to hold some resource in reserve, 
leaving only a fraction a of the R available, it must raise the 
price to p n . We do not wish to consider issues such as how to 
set price to maximize profit, what to do if the demand curve is 
time varying, how long it is until a price change is felt, or 
whether instability will result from feedback between demand 
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Figure .1-2. A demand curve. 



19 



and price. The point is: price is a lever for controlling the 
total community demand. 

The demand curve of Figure 1-2 represents the behavior of 
some economic community in statistical steady state; therefore 
no claim can be made that at any particular time the demand curve 
is reliable. This further motivates the idea of dynamic balance: 
at any time the demand is closely regulated, known to be within 
close tolerance of the desired level. 

An essential component of a supply-and-demand pricing policy 
is the ability for a user to bid . Should a user desire improved 
service (at correspondingly higher prices) he may outbid his fel- 
lows. Should a user be unconcerned with the quality of service, 
he may underbid, obtaining poorer service at reduced price. By 
assuming the existence of a bidding mechanism, we may ignore cer- 
tain delicate questions surrounding the issue of user dissatis- 
faction; that is, we will not attempt to model dissatisfaction, 
hoping that unhappy users will raise their bids, or leave. We 
shall discuss details of bidding mechanisms in Section 1.4.3. 

Such an atmosphere of free enterprise, incorporating supply- 
and-demand resource allocation and competitive bidding for pri- 
ority, can quite possibly wreak havoc with computer system econo- 
mics, there being a serious threat of inflation . In terms of 
Figure 1-2, bidding gradually forces the demand curve up and to 
the right. There are two extremes of thought concerning the ad- 
ministration's posture toward inflation: 

1. Do nothing . Just as other public utilities do, meter 
resource usage, but allow users as much as they need. 
This means that the administration must be willing to 
expand the system, adding new equipment so long as 
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someone is willing to pay for it. It also means that 
the administration must be able to detect trends in the 
community demand so that it can decide far enough in 
advance to order new equipment. 
2. Tight controls . The administration should exercise con- 
trol over the total demand by allocating resource quotas 
to users, and by limiting the total number of users. 
This in some ways resembles the policies of parking lot 
officials, who allocate 150 stickers to fill 100 spaces, 
on the grounds that (on the average) only 100 cars will 
show up. The quotas allocated will depend on careful 
interpretation of demand statistics, and should be set 
so that the number of users trying to use the system 
at one time will present a total demand only slightly 
larger than system capacity. 
By itself, the first alternative is not workable because there 
is a physical limit to how much a particular installation may be 
expanded, and users seem always to manage to find problems that 
consume the capacity of the system, no matter how large it is. 
By itself, the second alternative is not workable because it im- 
plies gradual degradation of service, since the system cannot 
meet the needs of the existing community. A truly flexible pos- 
ture is a compromise between the two extremes: the administration 
must be prepared both to enforce controls and to expand capacity. 
[But who is to insure that the administration indeed takes such 
steps, when it is the one who profits by the inflation?] 

In order to implement the compromise, the administration 
must monitor performance and detect overload. Overload may be 
defined as follows. First set tolerance limits on service, such 
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as maximum allowable response time, or minimum allowable service 
rate (that is, the fraction actually received of the resource 
demanded). Overload exists when the probability that service is 
not within the set limits exceeds some specified number; this 
probability is measured as the fraction of time service is poor. 

Even if the administration might want to decide against a 
quota system, the users may still desire some such system, for 
self -protection. By having a self-imposed quota, a user can pro- 
tect his pocketbook from a beserk computation. Should one of his 
programs run amuck, a quota would be exceeded and execution in- 
terrupted, the user being asked to decide whether to continue. 
Moreover, there should be some means whereby a user controls dis- 
tribution, among his own computations, of whatever resources have 
been allocated to him. This is particularly useful if the user 
supervises some project and desires to control spending by sub- 
ordinates . 

What is to be done when total demand temporarily exceeds 
capacity? Should all jobs be given equally poor service? Or, 
should jobs be divided into two classes, one to receive good ser- 
vice, the other to receive no service at all? As we shall see, 
the first alternative results in a high rate of resource multi- 
plexing and can easily cause thrashing; the second alternative 
may result in some jobs receiving no service. Balance can be 
used as a compromise: the balance set is that subset that receives 
all the service, but the membership in it is constantly changing. 
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1.4.2. Priorities 

In general, the higher a job's priority, the better the ser- 
vice it obtains. Basically, there are three classes of priority 
used in today's computer systems [C3] : 

1. Bought , by paying extra for better service. An ex- 
ample is the bidding mechanism discussed in the next 
section. 

2. Acquired , by displaying favorable or unfavorable char- 
acteristics during execution. An example is the CTSS 
multilevel queue [C6, S3] in which long jobs receive 
little attention. 

3. Deserved , by displaying favorable characteristics in 
advance of execution. An example, again, is CTSS 
[C6, S3] which gives jobs of small memory requirement 
better treatment than those of large memory requirement. 

A given computer system may employ a combination of these three 
types of priority. 

In our work here we shall consider only bought priority, 
and ignore acquired and deserved priorities. We ignore acquired 
priority because we deal with only completely fair resource al- 
location policies. We ignore deserved priority because we assume 
there is to be no advance allocation information. 
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1.4.3. Bidding 

We shall adopt the point of view that the bidding mechanism 
is a method by which a user purchases priority from the computer 
system (Kleinrock uses the more colorful term bribing [K2]). The 
cost of priority will be added to a user's resource-consumption 
costs. If a user buys higher than average priority, the cost is 
positive (his bill is increased); if he buys lower than average 
priority, the cost is negative (his bill is reduced). By adop- 
ting this view, we insure that inflation due to bidding is on 
the cost of the priority and not on the cost of the resources 
themselves . 

Let (p ,p„) be an interval of the real line; any point p 
in (p ,p ) is a possible priority. If a user bakes no action 
to obtain priority, he is assigned some standard priority p . 
Otherwise he selects some priority p from (p, ,p„). There is 
a cost -of -priority function G(p,t) satisfying 

G(p,t) > if p e (p ,p„) at time t 

(1.4.1) G(p,t) =0 if p = p Q at time t 
G(p,t) < if p s (p.,p ) at time t 

Let C (I) represent the resource-consumption cost for user k 
in the real time interval I; then user k would be billed 

(1.4.2) C, (I) + / G(p,t) dt 

* I 

There is clearly an incentive for a user to underbid his fel- 
lows, and a restraint against his overbidding. 

Let q..,...,g be the priorities of each of the n users at 
a certain time, and define the average priority to be 

q + . . . + q 

(1.4.3) q - -± 2. 
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An interesting example of a cost function is 

(1.4.4) G(p,t) = G(p) = C Q A h( P ) log A h(p) , h(p) = -^- 



q 



for suitable constants C Q and A. The reader can verify that G(p) 
satisfies properties (1.4.1). Since G(p) increases exponentially 
with the deviation from the mean q, it is possible to penalize 
a user severely for large deviations. This discourages those 
who would outbid the entire community and pre-empt all service 
for themselves. Observe, however, that if everyone bids high, 
q increases and the relative cost of a high bid is less. Thus 
inflation can be a serious problem (but note: the inflation is 
on the cost of priority, not on the cost of resources, and is 
not as serious as inflation of the resource costs themselves). 
The administration can control inflation by replacing q with p 
in eq. 1.4.4, and making p Q smaller than the existing q. 

We do not want the purchased priority to modify the demand 
of a job, for a simple reason. Should priority be allowed to 
modify a job's demand, operation of a balance policy would col- 
lapse: the scheduler would fail to keep the balance set demand 
at the desired level because the demands of its members were not 
accurately reported. 

The position of a job within its queue depends on the part- 
icular interpretation of the priority p it possesses. The two 
possibilities are: 

1# Fixed priority. An incoming job of priority p is placed 
ahead of any job with priority less than p, but behind 
any job with priority greater than or equal to p. 
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2. Percentile priority . If the priority range (p, ,p 2 ) is 
taken to be (0,1), then p may be interpreted as a per- 
centile. That is, the user wishes to be always ahead 
of lOOp per cent of the jobs. An incoming job of pri- 
ority p, arriving to a queue of length n, is placed a 
distance (l-p)n from the front of the queue. 
The fixed priority interpretation will in general mean that a 
user experiences different degrees of improved (or degraded) ser- 
vice, depending on the instantaneous demand of his job. For 
example, suppose his job oscillates between two demand classes, 
designated A and B, and there is a separate queue for each class. 
Let p. denote the largest priority of a class A job, p„ the smal- 
lest priority of a class B job, and p R <p n . Suppose the user hap- 
pens to choose his priority to be p such that p A <p<p_. When in 
class A, he receives the best of service; when in class B, the 
worst, The percentile method circumvents this difficulty, always 
giving the user the same improvement (or retardation) relative 
to other users. 

We conclude by noting an interesting way to implement bidding. 
Each console is provided with a potentiometer, calibrated on the 
range (p-,,P 2 ); the user may continuously adjust his priority. 
This can be enhanced by supplying a meter, also calibrated on 
the range (p,,p_), which indicates the current average priority 
q across all users, and the particular user can adjust his own 
priority with respect to the average. The existence of such a 
meter constitutes instantaneous feedback between an economic sys- 
tem and the competitors: some very interesting inflation and de- 
flation effects could occur, perhaps even resulting in conditions 
very similar to those in the stock market in 1929. 
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CHAPTER 2 



The Environment 



2.0. Introduction 

The environment, consisting of the hardware and the software 
operating system plays an important role in the resource alloc- 
ation problem. The reader should already be familiar with the 
concepts of virtual computer, of segmentation and paging [D8], 
of program and address-mechanism structure [Al] , of a process 
and parallel processes [D9], and of virtual time. We shall re- 
view these concepts here in order to establish the complete pic- 
ture (as we see it) of a multiprocess computer system. 
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2.1. The Basic System 

For ease of understanding both operation and design, it is 
usual to view the processing function and memory function separ- 
ately in a computer system. The processing function performs 
transformations on information stored by the memory function. 
The processing function is usually implemented by one or more 
processors, and the memory function by one or more memory modules. 

To satisfy the system objectives requiring expandability, 
reliability, and continuous availability, modular hardware con- 
struction is common: the processing function becomes a pool of 
identical processors with free and unrestricted access to a pool 
of identical memory modules. Removing (adding) a device from 
(to) a pool reduces (increases) the capacity of the pool. Within 
a pool each device is anonymous, there being no a priori assign- 
ment of any particular task to any particular device. 

The high cost of directly-addressable memory forces memory 
systems to consist of at least two levels: 

1. main memory . No information can be processed unless it 
is present in main memory. Main memory is usually a 
magnetic core memory, though it could just as well be 
any other directly-addressable storage device, such as 
a thin-film memory. Other terms for main memory are 
primary memory and execution store. 

2. auxiliary memory . Information which for one reason or 
another cannot be stored in main memory stored in aux- 
iliary memory. Examples of auxiliary memory are drums, 
disks, and tapes, although a slow-speed core memory might 
also be used for this purpose. Other terms for auxili- 
ary memory are secondary memory and backup store. 



28 



Main memory has relatively high cost, but also has rapid access 
time; auxiliary memory has low cost, but also has slow access 
time. 

Initially we shall restrict attention to a computer with a 
two-level memory system, indicated by Figure 2-1. After having 
studied program models, we shall generalize to multilevel memory 
systems; this will be done in Chapter 8. 

We assume that the unit of information storage and transfer 
is the page . We suppose the capacity of main memory is M pages, 
and the capacity of auxiliary memory is infinite. 

The N processors and M main memory pages will be called the 
equipment . For generality we assume that only a fraction a, for 
0<a<l, of the N processors are available, and that only a frac- 
tion B, for 0<B<1, of the M memory pages are available. The ctN 
processors and the fiM memory pages constitute the available 
equipment . It is against the available equipment that we want 
to balance demand. 

We suppose that each processor can deliver one reference 
per unit time, and that each item in main memory can be refer- 
enced no more than once per unit time, so that the processor and 
main memory speeds are matched. This unit of time will be called 
a virtual time unit (vtu) . 

There is a time T, the traverse time , involved in moving 
one page between memory levels. T is measured from the moment 
a page is found to be missing from main memory until the moment 
the missing page has been placed in main memory ready for use. 
T is actually the expectation of a random variable composed of 
waits in queues, access times, mechanical positioning delays, 
and transmission times. We shall regard the traverse time T as 
being the same regardless of which direction a page is moved. 
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Dividing memory into two levels creates the first allocation 
problem: storage management . the problem of deciding which infor- 
mation is to reside in main memory, which is not. Generally, 
the least-used information must be stored in auxiliary memory; 
the most-used information must be ready for use in main memory. 
When a processor makes a reference to a page not in main memory, 
a page fault occurs, initiating action to secure the missing page 
from auxiliary memory. We thus assume pages are brought into 
main memory on demand only . Because not every useful page may 
reside in main memory, there will be a flow of information — 
called page traffic — along the channel bridging the two levels. 
The activity of moving pages in and out of main memory is called 
page-turning , or simply paging . 

Nowhere in Figure 2-1 have we indicated the existence of 
input-output equipment, the media used by programs to commun- 
icate with the outside world, because we are not concerned with 
this type of allocation in this thesis (see Section 1.2). 
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2.2. Multiprocess Computer System Concepts 

Two basic principles in the design of multiprocess computer 
systems are the abstractions of the notions name space from 
memory , and process from processor . In the interest of program- 
ming generality, a user is given the illusion that he is dealing 
with a (configuration-independent) virtual computer . The virtual 
computer comprises one or more virtual processors each having 
most of the capabilities of a real processor, and a virtual mem- 
ory having many times the capacity of the real memory. Because 
the virtual memory has so large a capacity, the user sees no 
auxiliary memory; for this reason virtual memory is often called 
a one- level store [Kl] . It is the task of the operating system 
both to simulate virtual memory by paging information into real 
memory, and to simulate virtual processors with real processors. 
In Multics, the traffic controller mechanism [S2] handles assign- 
ment of real processors to virtual processors, and communication 
among virtual processors. 

The first abstraction, name space , is the set of names 
(addresses) available to a virtual processor for use as data 
identifiers . 

For convenience (to the user) the name space is divided 
into segments , of arbitrary size. To reference a datum, a two- 
component address (S,W) is given, S being the name of a segment, 
and W being the name of a word within S. Because names have two 
components, the name space is often called two-dimensional . 
There is no a priori relation between a name in name space and 
the location of the corresponding datum in physical memory; this 
correspondence is established dynamically by the addr es s -mapping 
mechanism [Al] . 
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For convenience (to the system) in mapping segments of ar- 
bitrary size into a memory of fixed size, segments and real mem- 
ory are divided into equal-size blocks, called pages. The page, 
invisible to the programmer, is the standard unit of information 
storage and transmission. We may thus regard the name space as 
being sliced into equal-size regions. 

Associated with each segment is a page table (itself a page) 
listing each page of the segment. If a page is not in main mem- 
ory, an in-core bit of the corresponding page table entry is OFF; 
an attempt by a virtual processor to reference such a page auto- 
matically causes a missing page fault, which interrupts execution 
of the virtual processor and initiates action to secure the mis- 
sing page from auxiliary memory. After a lapse of at least one 
traverse time (T) the page has been placed in main memory and is 
ready for use; the proper page table entry is set to point to 
the physical memory location of the start of the page, the in- 
core bit is turned ON, and execution of the interrupted virtual 
processor is resumed. Later on, when the page is removed from 
main memory, the in-core bit of the corresponding page table entry 
is again turned OFF. 

It is apparent that pages are on a lower level of abstrac- 
tion than segments. The operating system should not attempt to 
have each page of a segment in main memory; it should instead 
attempt to have each useful page in main memory. For it is pos- 
sible that only some of a segment's pages are in use, and there 
is no need to strain main memory resources by keeping useless 
pages there. Roughly speaking, a working set of pages is the 
smallest collection of pages that must be present in main memory 
for a program to operate efficiently. Storage allocation should 
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attempt to keep at least the working set of each running program 
in main memory. 

The second abstraction, process , is the notion of a program 
in execution by a virtual processor. In our work here, we use 
the equivalent definition: a^ process is an ordered sequence of 
references to information in name space, under the control of 
an instruction stream. A process is sometimes referred to as 
a thread of control through an instruction sequence. A process 
has four states of existence in real time: 

1. running , meaning that is is receiving the use of a real 
processor; alternatively, that a real processor is as- 
signed to its virtual processor. 

2. ready , meaning that it is demanding, but not receiving, 
the use of a real processor; alternatively, it is sus- 
pended only because no real processor is currently as- 
assigned to its virtual processor. 

3. page wait , meaning that it is temporarily syspended be- 
cause a page is missing from main memory. Execution is 
resumed as soon as the missing \page has been placed in 
main memory and a processor is available. We take the 
duration of a page wait to be the traverse time T. 

4. blocked , meaning it has no use for a real processor be- 
cause it is awaiting the occurrence of some (expected) 
external event, such as a message or signal from another 
process, from a device, or from a user at a console. 

Figure 2-2 illustrates the possible transitions among these states. 
The transition from running to ready under a pre-emption means 
that the operating system has required the real processor for 
some other use, for example to execute another process. The 
transition from ready to running under go-ahead means that the 
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Figure 2-2. States of a process. 
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operating system has decided to return the processor to this 
process. 

In Figure 2-2 we have indicated a transition from page wait 
directly back to running, when in fact this need not be the case. 
It is the case if a processor is dedicated to it, being immedi- 
able to resume execution when the process returns from page wait. 
But, if the page wait time T is larger than the time it takes 
to switch the processor to another process, it is uneconomical 
to dedicate a processor to a single process, and in this case 
a process returns to running status via ready status. In our 
work here, we assume a sufficiency of processor resources, so 
that at worst a negligible delay is experienced by a process as 
it passes through ready to running. This is the justification 
for the direct page-wait to running transition shown in Figure 2-2. 

When talking about processes we shall make a distinction 
between virtual time (vt) and real time . Virtual time is time 
as seen by a process as if it were never interrupted; that is, 
the total accumulated time in the running state. Virtual time, 
also called execution time or process time, is measured in vir- 
tual time units (vtu) , usually memory cycles. Put another way, 
a virtual time unit is the interval between any two of the suc- 
cessive information references that constitute a process. We 
shall usually regard virtual time as being continuous, even though 
it is actually finely divided into small units. Finally, real 
time is virtual time with page wait, blocked, and ready delays 
inserted appropriately. 

When we talk about the virtual time interval (t-T,t) we shall 
mean the T information references prior to the real time instant t. 
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Because a process is an ordered sequence of imformation re- 
ferences it is often called a sequential process [Dll]. In a 
multiprocess computer system, many processes may be executed con- 
currently, or in parallel. Thus, we may speak of parallel se- 
quential processes . 

We define a computation to be a collection of mutually co- 
operating processes and information, all operating in the same 
name space. In Multics [C8, S2, V2] every computation is a 
single-process computation, since there is a one-to-one cor- 
respondence between a process and a name space; however a pro- 
grammer can execute a program with parallel processes by setting 
up a collection of single-process computations with isomorphic 
name spaces. IBM System 360 [R3], RCA Spectra 70 [02], and THE- 
Multi programmed System [Dll] are other examples of systems using 
single-process computations. The Illiac IV [S6] is an example 
of a system using multiprocess computations. 

The constraints among the member processes of a computation 
are, from the resource allocation viewpoint, unspecified and must 
be considered arbitrary. For the very same reasons that compilers 
and programmers cannot specify before hand the resource needs 
of their programs (because of arbitrary timing of parallel pro- 
cesses and data dependence), compilers and programmers cannot 
predict the constraints among parallel processes. 

By the term contemporary computer system we shall mean a 
Multics-like system, characterized by single-process computations. 
Such systems are not geared for a high degree of intra-computation 
parallel programming because in them the tables specifying a 
computation are so ponderous that the cost of spawning new pro- 
cesses is prohibitive. 
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We assume that the operating system allocates resources to 
computations , rather than to processes individually. Thus a 
commitment must be made to grant a computation all the processors 
and all the memory it needs. In a contemporary computer system, 
the notion of scheduling a process is the same as this more 
general notion of scheduling a computation. 
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2.3. Summary 

We have reviewed the basic concepts of the computing en- 
vironment, presuming familiarity with such common notions as 
segments, pages, demand paging, page traffic, virtual computer, 
virtual processor, and virtual memory. Terms whose meaning i s 
important in this thesis are: 

1. Process ; a sequence of information references. 

2. states of a firocess : running, ready, page wait, blocked. 

3. v irtual time: time seen by a running process. 

4. computation : a family of cooperating processes and 
information within the same name space. 

We turn attention in the next chapters to the definition 
and characterization of the working set model for program behavior. 
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CHAPTER 3 



The Working Set Model for Program Behavior 



3.0. Introduction 

We introduce and justify here the most basic concept in 
this thesis: the locality of information references. This is 
the property of program behavior that, during any interval of 
execution, the program favors a subset of its information. A 
working set of information dynamically measures this set of 
favored pages. A working set memory allocation strategy guaran- 
tees each running process that its working set shall be present 
in main memory. We shall show that working set strategies are 
optimum in two senses: minimum cost and minimum sensitivity to 
thrashing. 

First, we will say that a strategy is optimum when it pro- 
duces minimum cost (the product of memory space and time). After 
discussing various strategies, we show that working set strate- 
gies result in minimum cost. The proof is based on certain con- 
vexity properties, which follow from locality, of the cost function. 

Second, we investigate the causes of thrashing, and show 
that working set strategies minimize the possibility of thrashing. 

We conclude the chapter with a survey of the literature, 
best done in the light of the working set model. 
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3.1. Locality and Working Sets 

3.1.1. Definition and Justification 

Throughout this thesis we shall assume that locality is a 
fundamental property of program behavior. Locality is the pro- 
perty that, during any interval of execution, a process will favor 
some of its pages more than others; during disjoint virtual time 
intervals, the set of favored pages may be different. Put an- 
other way, if one observes a process's reference pattern for some 
virtual time interval, he will see that the process does not 
scatter its references uniformly across its information. There 
are at least five factors motivating this assumption: 

1. Sequential instruction steams . Both programmers and 

compilers tend to organize sequentially the instructions 
that direct the activity of a process; this is especi- 
ally true in single-address machines (i.e., those with 
a program counter). If a process fetches an instruction 
from a given page, it is highly probable that it will 
soon fetch another instruction, in sequence, from the 
same page. 

2. Functional modularity . Program modules are organized 
and executed by function. 

3. Content-related data orqani z ati on . Information is us- 
ually grouped by content into segments, and is normally 
referenced that way; thus, references will occur in 
clusters to a content-related region in name space. 

4. Looping « Programs often loop within a set of pages. 

5. People . Realizing that their programs will run on a 
paged machine and that page transfers are costly, pro- 
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grammers tend to organize their algorithms so that ac- 
tivity is localized within subsets of their information. 
Moreover, people have been studying methods of minimizing 
interpage references at execution time; see refer- 
ences [B1,B2,C5,M1,01,R13. 
Experimental evidence suggests that this assumption, locality, 
is a very good assumption. Suppose to the contrary that, during 
every virtual time interval, a process scatters its references 
uniformly over its information. Suppose that a fraction s 
(0<s<l) of its pages have been placed in main memory. Let 
^(s) be the fraction of its references the process makes to the 
set of pages not in memory; since the references are uniformly 
scattered, it follows that 

fi.(s) = 1 - s 

Experimental evidence, illustrated in Figure 3-1, contradicts 

this [B1,V1]. As measured, ^(s) actually follows some curve that 

lies below the curve jj,(s)=1-s. It has been observed that there 

is some number s and constant k>l, such that if s<s then 
o o 

p.(s)=l-ks; that is, the process is scattering its references 
uniformly over only a subset of its information. The numbers 
s and k depend on the particular program and the particular 
storage management rule used to decide what information is to 
reside in main memory. 

We will therefore assume that locality is a property of 
program behavior. 

We define the working set of information W (t,T) of process 
p at time t to be the set of pages that process p has referenced 
during the virtual time interval (t-Tjt). The idea is illus- 
trated in Figure 3-2. 
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random 



Figure 3-1. Evidence supporting locality. 
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The validity of the working set model rests on the concept 
of locality. A working set W (t,T) measures the set of pages 
process p is favoring at time t. Assuming that process p is 
not likely to abruptly change its set of favored pages, the 
working set W (t,T) constitutes a reliable estimate of p's 
immediate memory needs. To put it another way, we are assuming 
that, on the average, 

Pr[page i referenced next |i e W (t,x)] > 

Pr[page i referenced next |i £ W (t,T)] 

The working set parameter T should be chosen as small as 
possible, and yet assure that W p (t,T) contains p's favored pages. 
Thus, T may vary from program to program, and from time to time. 
We shall discuss details of choosing T in Chapter 4. 

We assume that the page size (i.e., the number of words in 
a page) is chosen small enough so that the working set W p (t,t) 
always consists of at least several pages. Indeed, if in a 
particular computer system we observed that working sets often 
consisted of only one or two pages, we would begin to suspect 
that a smaller page size might result in smaller working sets 
and in smaller memory requirements for programs. 

Intuitively, a working set is the smallest set of infor- 
mation that ought to reside in main memory so that a process can 
operate efficiently. A working set memory management policy , is 
one that permits a process to be running if and only if there is 
enough uncommitted space in main memory to contain its working set. 

Define the random variable x to be the virtual time interval 
between successive references to the same page. These inter- 
reference intervals x are useful for describing certain program 
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properties, which we will do in detail in Chapter 4. Let 
F x (u) = Pr[x<u] denote its distribution function and let x 
deonte its mean. A working set is the collection of a process's 
pages whose current interref erence intervals (in virtual time) 
satisfy x<T. 

By a program we mean the set of information to which a 
process directs i Is references. There is a relation between the 
size of a program and the lengths of the interref erence intervals 
to its component pages. Let process 1 be associated with program 
P^ and process 2 be associated with program P , and let P be 
larger than P_. Then process 1 has to scatter its references 
across a wider range of pages than process 2, and we expect that 
the interref erence intervals x of process 1 will be longer than 
the interref erence intervals x„ of process 2. That is, P bigger 
than P implies x >x . 

3.1.2. Pictorial Representations 

It is useful to develop some pictorial representations for 
the notions of working set and locality. Let C be a computation 
and _M be the name space used by C; we may imagine that elements 
of M have been grouped together, by pages. We may associate with 
C a process space _P_ whose elements are the processes ( seguences 
of information references) of C. If C is a single-process com- 
putation, £ contains just one sequence. If C is a multiprocess 
computation, £ contains several sequences. In Figure 3-3 we show 
a process p in £; the directed line suggests the ordering of the 
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Figure 3-3. Association of working set with process, 
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information references constituting p; two of the information 
references, one at time t, the other at time (t-T), have been 
singled out. We may imagine that W (t,T) is a projection of the 
virtual time interval (t-T,t) into M. Adjacencies indicated in 
rj (i.e., the content of W (t,x)) should not be construed as ad- 
jacencies of address values; they are simply adjacencies of re- 
ferences in virtual time. 

Figure 3-4 depicts the assumption that the content of 
W (t,T) is not fast-changing. For small time separations a, we 
expect a large intersection between W (t,T) and W (t+oc,T). For 
large time separations B (with B»a and B»t) we do not expect 

an intersection between W (t,T) and W (t+B,x) because p has had 

P P 

ample opportunity to finish the work of time t by time (t+B). 
Put another way, we expect a working set W (t,T) to be a reliable 
estimate of p's memory needs only over a short interval. 

Figure 3-5 illustrates the situation for a multiprocess com- 
putation C. Let P(C,t) denote the processes in C at time t that 
are running or in page wait (i.e., receiving the use of resources) 
The information that should be in main memory is 

W c (t,T) = |J W (t,T) 
peP(C,t) 

Note that some of the working sets may overlap, because processes 
may share information. 

Note further that P(C,t) may be regarded as a working set 
of processes in the process space P. 
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Figure 3-4. Time movement of a working set, 
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Figure 3-5. Working sets for multiprocess computation C. 
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3.1.3. Interactions 

The foregoing discussion deals with locality concepts during 
virtual time intervals that contain no interactions. An inter- 
action is an instant in virtual time at which the process stops 
to wait for a message. What happens to our definitions if the 
virtual time interval contains an interaction? 

When a process stops (blocks) for an interaction, it seeks 
a message or signal from another process, for a device, or from 
a user at a console. An interaction has two properties of 
interest to us : 

1. The process enters the blocked state where it may remain, 
unpredictably, for a long time. 

2. The message received by the process may affect its 
behavior following the interaction. 

This second property means that, if t ± is an interaction instant, 
the working set W (t^T) may not be a good estimate of any working 
set W (t,T) for t>t i? because the message may seriously alter 
p's behavior. 

What we will do is assume that the working sets before and 
after an interaction intersect, though not completely. We believe 
that the expected size of the intersection will tend to decrease 
with long blocked-intervals, because in longer time intervals, 
for example, a user will have more opportunity to change his mind 
and alter the behavior of his process. Conversely, the shorter 
the duration of a blocked-interval, the greater the expected size 
of the intersection between the working set before and after the 
interaction. 



51 



An example is helpful. Figure 3-6 illustrates a program 
organization likely to be typical of modular, interactive programs, 
The user sends requests to the interface procedure A; having 
interpreted the request, A calls on one of the procedures 
B , ...,B to perform an operation on the data D. The called 
B-procedure then returns to A for the next user request. Inter- 
actions occur whenever the process enters A to await a message. 
A program organization such as this might be used (for example) 
in an editing program. Just before the interaction, the working 
set will contain A, D, and one B-procedure. Just after the in- 
teraction, the working set will contain A. The intersection is 
just A. 

A study of intersections of working sets before and after 
interactions is needed in order to assess the value of look-ahead 
when a process unblocks. 

Because we are not interested in input-output allocation 
in this thesis, we will no longer be concerned with the effects 
of interactions on program behavior; from now on we assume that 
virtual time intervals contain no interactions. This problem 
has been studied in reference [D5] . 
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Figure 3-6. Organization of an interactive modular program. 
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3.2. Convexity [W2, p. 563] 

We shall prove a theorem about convex functions which will 
be of great importance in the following sections. 

A function f (x) is strictly convex on an interval I if its 
second derivative is negative: 

f ' ' (x) < x e I 

and f (x) is strictly concave if its second derivative is positive. 
If the second derivative is zero on an interval, we regard the 
function as being either convex or concave on that interval. 
Since f is convex if and only if -f is concave, we may restrict 
attention to convex functions. Figure 3-7 shows a strictly con- 
vex function; note that every line segment connecting two points 
on the curve lies below the curve. 



Theorem 3.1 . Suppose x is a random variable on an interval I, 
where its probability density function P x <u) satisfies 

/ p (u) du - 1 
I x 

f u p (u) du = x 
I X 

Suppose also that f is a strictly convex function on I. 

Then 



f(x) < f(x) 
and egualtiy holds if and only if P x (u) = 6(u-x) (the 
impulse function) . 

Proof . Since x is a fixed number we may expand f(x) around the 
point x using Taylor's expansion: 

f(x) = f(x) + (x-x) f'(x) + ^ x x; f"(z) some zel 
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Since f ' ( z ) <0 for zel, 

f(x) < f(x) + (x-x)f'(x) 

Taking expectations on both sides, and noting that f(x) 
and f (x) are constant, 



f(x) < t(x) + (x-x) f'(x) 



but since (x-x)-O, we have 



f ( x ) < f ( x ) 

The equality cJ early holds if and only if x=-x with prob- 
ability 1. 

OF.D. 

In Figure 3-7 we show a geometric interpretation of the theorem 
for the simple case 

{L/2 u - x + k 
otherwise 

and it is clear that 

f(x+e) + f(x-rf) 



f(x) = < f( x ) 

2 

Observe from the definition and the figure that, for f to be 
convex on an interval I, it is sufficient that 

f(x+e) + f(x-e) < 2 f(x) 

for all choices of x and e such that (x+e) and (x-e) are in I, 
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f(x) — 




Figure 3-7. Illustrating convexity theorem. 
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3.3. Working Set Size 

Let W(t,x) be a working set. Define the working set size 
co ( t , t) to be: 

co(t,r) Number of pages in W(t,x) = ]W(t,x)| 



We assume that the working set size o)(t,T) is a stationary sto- 



chastic process, so that the time expectation co(t,T) is inde- 
pendent of t, and we may write 



w ( i ) - co { t , 'i ) 



where we understand overba'r to mean time expectation. 



Theorem 3.2 . The expected working set size w(t) has these 
properties : 

1 . w ( t ) < i, 

2 . w v ) = 

3. w(x+\) _> w('u) A>0 (non-decreasing) 

4. w(t) is convex. 

Proof : Since the maximum number of distinct references that can 
occur in T vtu is 'C , we have oo(t,i;)<T and hence w(t)<t. 
That w(0)-0 is clear since no pages can be referenced in 
zero time. That w ( %+X) >w ( T ) is also clear since more pages 
can be referenced in longer intervals. To show that w(t) 
is convex, 'we will show that for all choices of x and e such 
that (t-l")>_0, 

2 w(x) _> w(t-ic) -i w(T-e) 

So, let 'C and e be arbitrari ly given, with (T-e)_>0. Refer 
to Figure 3-8. Observe that 



W( t,T) U W( t-T,r) 

Using |x U y| | k| 
Y, we have 



w( t,t-s)U w(t-(T-e) , x+e) 
|y| - |xflb| for any sets X and 
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0>(t,T) + U(t-T,T) = U)(t,T-E) + CO(t-(T-e),T + £) 

+ |a| - |b| 



where 



A = W(t,T) nW(t-T,T) 

B = w(t,x-e) nw(t-(T-e) ,x+e) 
taking the time expectation on both sides, 

W(T) + w(T) = w(T-s) + w(T + 6) + "fTf - ~\B\ 

We claim "[aJ " W > 0. To see this, note that |a| < w(t) 
and |b| <w(t-e), so |a| is potentially bigger than |b|. 
During the operation of averaging over all t, any page that 
appears in B must also appear in A. Thus, on the average 
at least as many pages appear in A as in B; hence we have 
|A| - |B| > and the required inequality follows. 

QED. 

Note that properties 1-3 of Theorem 3.2 apply also to the random 
variable co(t,T) itself, but property 4 applies only to w(t). 

In Figure 3-9 we have sketched w(t) for two kinds of pro- 
gram. A hard, or incompressible, program is one with a well- 
defined set of favored pages; a soft , or compressible program is 
one with a fuzzily-defined set of favored pages. A hard program 
tends to scatter most all of its references uniformly over some 
set of T Q favored pages, so that for any interval t<t q we expect 
to see mostly distinct pages referenced, and wU) increases 
(almost) linearly with T. For such programs we want to choose 
T>T Q . An example of a hard program is the so-called stream- 
processing program, whose algorithm is contained wholly in a 
set of t q pages, and occasional references are made to a sequence 
of data pages; after only a few references, each data page is 
discarded forever. If t»t o , the working set will contain many 
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Figure 3-9. Expected working set size. 



60 



useless pages. Choosing T close to T will not affect the 

program's operating efficiency, but will diminish the amount of 

memory it occupies. 

Recall the definition of the interref erence intervals x 

(the vt intervals between successive references to the same page), 

with distribution function P (u)=Pr[x<u], and density function 

f (u)=-r— F (u). We shall assume F (u) is convex. This is not 
x du x x 

unreasonable since it requires only that f (u) be decreasing, 
which is consistent with the concept of locality. In fact, there 
is strong evidence that that this type of interarrival distri- 
bution is modelled nicely by a hyperexponential distribution 
[C4,F4], which is convex. 

We define the missing-page probability \(x) to be the prob- 
ability that a process directs its next reference to a page not 
in the working set W(t,T); under a working set memory allocation 
strategy, such a page may be missing from main memory. 

Theorem 3.3 . Let X(t) = Pr[process references a page not in W(t,T)], 
Then \(t) = 1-F (t). 

Proof : The probability the page referenced is not in W(t,x) is 
just the probability its most recent interref erence inter- 
reference interval satisfies x>T, so \(t) =Pr[x>x] =1-F x ( T) . 

QED. 

We will need the following two theorems to prove that a working 

set strategy is optimum. 

Theorem 3.4 . Suppose t is varied on some interval, with mean T. 
Then the average missing— page probability is increased: 

Mt) > \(x) 
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Proof : Since we assume F (u) is convex, \(x) = 1-F x (t) is con- 
cave, and by Theorem 3.1, we have X(x) > \(x). 

QED. 



Theorem 3.5 . Suppose t is varied on some interval, with mean T. 
Then the average working set size is decreased: 
w( t) < w( t) 
Proof: By Theorem 3.2, w(x) is convex. By Theorem 3.1, we have 



w(t) <_ w( t) . 

QED. 



Varying i increases the probability that a missing page will be 
referenced, as well as diminishing the average memory share held 
by a process. That is, varying T with mean x on some interval 
has the same effect as holding X fixed at some T q <t such that 

w( T ) = w( x) . 
o 
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3.4. Storage Management Policies 

Storage management policies for multiprogrammed memories 
may be regarded as operating in two provinces : 

1. Fetching (page in) : Locate the required page in aux- 
iliary memory, and load it into main memory; turn the 
in-core bit of the corresponding page table entry ON. 

2. Replacement (page out) : Remove some page from main 
memory, turn the in-core bit of the corresponding page 
table entry OFF. The policy rule that decides which 
page to remove is called the replacement rule . 

Management algorithms may be classified according to their me- 
thods of fetching and replacement. 

Fetch strategies may load pages before they are needed 
(pre-paging) , at the moment they are needed (demand paging), or 
even later. Many strategies use demand paging; that is, no ac- 
tion is taken to bring a page into main memory until some process 
attempts a reference to it. Demand paging is usually preferred 
to pre-paging because it is much cheaper to implement, and be- 
cause it is not clear that pre-paging improves performance sig- 
nificantly. As we have stressed, advance information is often 
non-existent because there is no reliable source of allocation 
information. In fact the only major argument favoring pre-paging 
is the possibility of moving large contiguous blocks of pages 
from auxiliary memory so that the accumulated traverse time is 
reduced in the long run. Although traverse time reduction is 
(in some sense) a valid argument for pre-paging, we feel that 
it is also a more powerful argument for better, faster, aux- 
iliary memories. 

It may be argued that a working set, a supposedly reliable 
estimate of a process's immediate memory needs, is the ideal 
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set of pages to pre-load. Because the records required to keep 
track explicitly of which pages belong to which working sets may 
easily become so complicated that any benefits resulting from 
pre-paging may be lost, we prefer to assume that fetching occurs 
on demand only, via the page fault mechanism. 

The major problem in memory management is not deciding which 
pages to load; it is deciding which pages to replace . A storage 
management policy should attempt to keep in main memory the pages 
most likely to be used. Thus, the best choice for replacement 
is the page with the least likelihood of being reused immediately. 
Debate has arisen over which replacement , or page-turning , 
strategy is best. 

The cost of operating a program under a given strategy will 
be defined (Section 3.5.1) to be the amount of memory used times 
the duration of such use. We will say that the optimum strategy 
is the one that results in the lowest cost. In Section 3.5.1 
we will show that low missing-page probability is equivalent to 
low cost. We shall therefore use the missing-page probability 
as a measure of performance for a paging policy; this will be 
done in Sections 3.5.2 and 3.5.3. 



If a page has been modified since being placed in main memory, 
replacing it involves transferring it into auxiliary memory; 
an unmodified page is simply overwritten, provided there is 
a copy in auxiliary memory. 
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Allocation of pages in multi programmed memories can be han- 
dled on either a fixed or variable memory basis: 

1. Fixed share . Before being run, a program is granted a 
share of the memory for its private use. 

2. Variable share . Programs are allowed to compete freely 
for memory space. In principle, more aggressive pro- 
grams should be able to obtain a greater share of the 
memory. In principle, as a program expands or contracts, 
its share increases or decreases accordingly. 

In Section 3.5.2 we shall prove that variable-share strategies 
yield smaller missing-page probabilities than fixed-share stra- 
tegies, all other things being equal. Policy rules for replace- 
ment (which may be used with either fixed or variable share basic 
strategies) fall into the following three classes, ordered in 
terms of the intrinsic increase in the logic required to im- 
plement: 

1. Static rules , which use no information about page use; 
these rules are very simple to implement. 

2. Usage rules, which use information about page use, gen- 
erally measuring time intervals since the last reference 
to each page. 

3. Demand rules , which attempt to predict, on the basis 

of recent reference patterns, the set of pages most likely 
to be used immediately. A program is given more or less 
space according to its demand for space. 
We shall show in Section 3.5.2 that the static rules lead to 
the highest missing-page probabilities, the demand rules the 
lowest. 
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There are two static rules of interest: 

1. Random (RAND) . Whenever a fresh page of memory is needed, 
a page is selected at random to be replaced. Imple- 
mentation is simple, requiring only a random-number 
generator . 

2. First-in, First-out (FIFO) . Whenever a fresh page of 
memory is needed, the page least recently paged in is 
retired and another page brought in to fill the newly 
vacated slot. Whereas RAND requires a random number 
generator, FIFO requires only a counter, and implemen- 
tation is even simpler, as follows. The pages of main 
memory are regarded as a cyclic group; suppose the M 
pages of main memory are numbered , 1 , . . . , (M-l ) and a 
pointer k indicates that the k page was most recently 
paged in. When a fresh page is needed, [(k+1) mod M]-*k, 
and page k is retired. 

The principal argument for these two rules is their simplicity 
of implementation. Yet the experimental evidence [B1,B2,V1] 
indicates that usage rules, despite higher overhead, significantly 
outperform the static rules. 

There are two usage rules of interest, LRU and FINUFO: 

3. Least recently used (LRU) . Whenever a fresh page of 
memory is needed, the page unreferenced for the longest 
time is removed. Each page table entry contains a use 
bit , set ON each time the page is referenced. At per- 
iodic intervals, all page table entries are searched, 
use bits reset, and usage records updated. 

Unfortunately, implementation of an LRU rule may become compli- 
cated, and it is not clear whether an overall improvement would 
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result. A very interesting rule combines the simplicity of FIFO 
with the sophistication of LRU: 

4. First-in, Not-used, first-out (FINUFO) . Implementation 
is almost exactly that of FIFO, but now the use bits 
come into play. Let k be the pointer that cycles through 
the M pages of memory. Whenever a fresh page is needed, 
k is incremented until a page is found with use bit OFF; 
this page is retired. When k passes a page with use 

bit ON, the use bit is turned OFF. 
It is interesting to note that FINUFO is much closer to a demand 
rule than to a usage rule, because when demand for main memory 
is high, FINUFO will have difficulty in finding a page to remove 
(many use bits ON). Whereas FIFO and LRU will always find a page 
to remove, FINUFO may not. It is therefore a stable rule. 
Another usage rule, primarily of academic interest, is: 

5. ATLAS loop-detection . The Ferranti ATLAS computer [Kl] 
had a paging strategy that attempted to detect loop 
behavior in page reference patterns, then minimize 
page traffic by maximizing the time between page trans- 
fers; that is, by removing pages not expected to be need- 
ed for the longest time. Performance was satisfactory 
for programs exhibiting loop behavior; unsatisfactory 
for programs exhibiting aperiodic reference patterns, 
because the algorithm attempted to predict loops when 
there were none. Implementation was costly. 

Two kinds of demand rules warrant investigation, biased 
rules and working set rules : 



"'"Reported (by J. H. Saltzer) to be used in Multics. 
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6. Biased replacement rules . In round-robin fashion, each 
program is favored for an interval of time. During its 
favored interval, none of its pages are removed and 

it may acquire new pages without hindrance. After its 
favored interval, it will be forced to give up pages in 
deference to other programs.' When a page is to be re- 
tired, any of the rules discussed above may be applied 
to the non-favored pages . 
Belady [B2] reports that a biased FIFO rule on the M44/44X com- 
puter improved performance significantly. The arguments given in 
Section 3.5.4 may be used to show that biased rules will perform 
better than non-biased rules (except the working set rule). In- 
tuitively this makes sense, because large programs will have op- 
portunity to expand into memory shares more matched with their 
needs . 

7. Working set (WS) . Guarantees that a computation re- 
ceives the use of processor if and only if there is 
enough uncommitted space in memory to contain its work- 
ing set pages. Thus, every page belonging to the work- 
ing set of some running process must be kept in main memory. 
Pages in no working set are subject to removal, though 
need not be removed until the space is needed. A com- 
putation acquires more or less memory in accordance with 
fluctuations in its working set size. Should the total- 
ity of working sets exceed memory, some program (perhaps 
the one present there for the longest time) is removed 

in order to clear space. 



This complicates implementation, because now identification of 
pages by program is required. 
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These seven are a portfolio of the most interesting and most 
important rules. 

In the next section we will compare all these strategies 

and show that WS is optimum. 
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3.5. Working Set Strategies are Optimum 

The following demonstration that working set strategies are 
optimum is based on the concept of locality, because we shall 
rate strategies by their ability to retain a process's favored 
pages in main memory. 

First we show that the missing-page probability is a valid 
measure of performance for a paging policy. Then, using the 
convexity properties of working set size and missing-page prob- 
ability (Theorems 3.4 and 3.5) we show that the working set 
strategies have the lowest missing-page probabilities. For ease 
in discussion, we start by studying the algorithms operating on 
one program in a cramped memory. We then generalize to the case 
that many programs reside together in memory. 



3.5.1. The Cost of a Strategy 

Suppose h(t) is the number of pages of memory held by a cer- 
tain program at time t. Define the cost C(I) for memory usage 
over the real time interval I to be 

(3.5.1) C(I) = / h(t) du 

I 

which is the space-time product of memory usage. We will say 
that the best strategy is the one that produces minimum cost. 
C(I) includes page wait times in the interval I; even though the 
process is not running, its information still occupies space 
during page waits. 

For convenience, we shall deal with the cost per unit 
virtual time , G, which we define to be 
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(3.5.2) G = £42U- 

v(I) 

where v(I) is the amount of virtual time contained in I. We 
are interested in the paging policy with smallest G. 

We consider a certain program consisting of r pages, which 
is operating in a space of s memory pages. The missing-page 
probability \i(s) is the probability that the process references 
a page not in main memory, when s pages are in main memory; 
clearly, |i(s) depends on the paging algorithm . 

It is important to note that p.(s) is an average. To measure 
(i(s) experimentally, one would run a program in a space s, using 
a given paging algorithm, for a virtual time interval of length V. 
If he observed R references to pages not in memory, he would 
assign ji(s)=— . Thus, |j.(s) is also the rate at which page faults 
occur, for in a virtual time interval of length V, we expect 
V|J.(s) page faults. 

We have sketched p.(s) in Figure 3-10 for two strategies, 
which we shall call 1 and 2 (cf. Figure 3-1). 

The cost per unit virtual time G(s) of a strategy, as a 
function of the number s of pages in main memory, is related to 
fi(s) as follows. Suppose the program has executed for V vtu, 
and suppose |i(s) is constant over this interval. The expected 
number of page waits is Vp.(s), and so the total elapsed real 
time is 



(3.5.3) t r = V + V(i(s)T = V(l+(i(s)T) 

where each page wait costs one traverse time T. The memory 



When the missing-page probability is a. function- of memory space 
s, we will write it as n(s). When it depends only on the work- 
set parameter u, we will write it as \(t). 
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p igure 3-10. Missing-page probability for two strategies, 
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space s is constant for this interval, so the cost is 

(3.5.4) s t = sV(l+|a.(s)T) 
and the cost per unit virtual time is 

(3.5.5) G(s) = Ijr = s(l+(i(s)T) 

V 

We have sketched G(s) for the two strategies, 1 and 2, in 
Figure 3-11. Because ^i(s) for strategy 2 is flatter than for 
strategy 1, the optimum memory size s . is smaller for stra- 
tegy 2 than for strategy 1. Moreover, because 

(3.5.6) ji 2 (s) < ^(s) 
it follows that 

(3.5.7) G 2 (s) < G^s) 

We therefore obtain two important conclusions. First, 
the smaller the average missing-page probability, the cheaper 
is the policy. Missing-page probability is therefore a valid 
performance measure. Second, the smaller the average missing- 
page probability, the smaller is the optimum memory space s Q .. 
Hence, under better strategies, more programs can be placed in 
memory . 

If the working set parameter T is properly chosen (Chapter 4), 
it is possible to cause a working set strategy to operate at 
or near its current value of s . . 
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3.5.2. Single Program Case 

Imagine an experiment (depicted in Figure 3-12) in which 
the same program is executed in memories A, B, C, and D; memory 
A uses RAND replacement, B uses FIFO, C uses LRU, and D uses 
FINUFO. Let |i(s) denote the missing-page probability when a 
fraction s (0<s<l) of the program is in memory. We claim that 

H- A ( s ) > H c ( s ) 
(jl b (s) > (J- C (s) 
(i D ( s ) > |i c ( s ) 

Since we assume locality is a basic program property, the ques- 
tion is: How well do RAND, FIFO, LRU, and FINUFO keep a program's 
favored pages in memory? 

To answer the question we imagine that we are trying to 
measure (i A <s), (i B (s), |a c (s), and |-L D (s) by observing the rate 
at which page faults occur. 

Since a process references its favored pages most often, 
we expect that the least recently referenced pages in memory are 
the least favored; thus, LRU tends to retain favored pages. RAND 
may very easily select a favored page, even one that LRU would 
not; thus, we expect RAND to induce more page faults over an 
execution interval than LRU, and so M-.Cs) > |J. c (s). Under FIFO, 
it is certain that every page will eventually be removed; thus, 
we expect FIFO to induce more page faults over an execution in- 
terval than LRU, and so H R (s) > ji_(s). 

We make no claim that one or the other of RAND and FIFO 
is better. On the one hand, there are cases in which RAND is 
better than FIFO (e.g., an unchanging set of favored pages — 



74 



B 





Figure 3-12. Conceptual experiment to compare strategies. 




Figure 3-13. Missing-page probabilities for the strategies. 
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FIFO eventually removes every one of them, whereas RAND may not). 
On the other hand, there are cases for which FIFO is better than 
RAND (e.g., a process which changes its set of favored pages 
completely before FIFO completes a cycle — FIFO removes the 
old pages first, whereas RAND may select some of the new favored 
pages). FIFO, of course, is cheaper to implement. 

The FINUFO algorithm operates nearly the same as LRU, there 
being little difference, except in cost of implementation. If 
all the use bits have been set, FINUFO will do worst than LRU 
for the following reason. On its first cycle through memory, 
FINUFO finds all use bits ON, and clears them; since the process 
is in page wait, the use bits remain OFF, so on its second cycle 
through memory, FINUFO will select the first page whose use bit 
it cleared on the first cycle. Thus, FINUFO essentially selects 
a page at random. Assuming correlation between age and useful- 
ness, we expect that there are situations in which LRU induces 
fewer page faults during an execution interval than FINUFO, and 
so on the average R D (s) > (i c (s). 

In Figure 3-13 we show p.(s) sketched for memories A, B, 
C, and D in our conceptual experiment. In Region I, all three 
policies behave equally poorly, because too few pages are in 
memory. In Region II, the differences become apparent. At s=l, 
all three policies are again the same, since the program is en- 
tirely present in memory. 

To complete the discussion, we must show that WS is better 
than LRU. 
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We imagine a second experiment, to compare LRU and WS, 
shown in Figure 3-14. Memory A, of size M, is run under LRU. 
Memory B, of variable size, runs under WS with x fixed at X q 
such that the average working set size is w(X q ) = M. Note that 
LRU and WS are very similar in operation: LRU keeps the M 
most recently used pages in memory, whereas WS keeps the cu(t,X Q ) 
most recently used pages in memory. 

Figure 3-15 compares the behavior of the two policies. 
Figure 3-15a shows T fixed at X q ; at two times t ± and t 2 the 
working set size is a)(t 1 ,T Q ) = w ± and u)(t 2 ,X Q ) = w 2 , and so the 
size of memory B varies at least over the range (w^Wg). 
Figure 3-15b shows that memory A does not vary, is fixed at M. 
Hence at the times t- L and t 2 memory A is operating at 1^ and 
x_ respectively. That is, we may hold the working set size 
fixed at M by varying x so that the working set is always exactly 
contained in memory A. 

Thus, memory A is simulated by a working set strategy with 

X varying around mean X on the range (x^x^, and memory B has 

x fixed at x . Writing the missing-page probability for WS as 
o 



77 










ws 




1 = X 




o 



w( T ) 
O 



Go( t , T ) 
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Figure 3-15. Comparison of LRU and W3 
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\(t), and recalling Theorem 3.4, 



M- A (M) = \(T) > VV 

and so WS is at least as good as LRU. 

Intuitively this makes sense. Suppose the program is con- 
stant at size M throughout execution except for a single refer- 
ence to the (M+l) st page. If it is in memory A, the reference 
to the (M+l) st page displaces some other page in A, which must 
be recalled. 

Note that we have also shown that a variable size share of 
memory is superior to a fixed size share of memory. This makes 
sense since: 

1. As we have stressed, advance knowledge of program size 
is often non-existent, and indeterminable. The share 
cannot be chosen optimally in advance of execution. 

2. If each program gets a fixed share of memory, we cannot 
guarantee that memory is densely packed with the most 
useful information. A small program operating in too 
large a space is occupying space it does not need, space 
which could and should be given over to a large program 
operating in too small a space. Using variable shares 
permits allocating space on the basis of need. 

If the programs in question do not satisfy locality, the 
arguments above fall apart. Consider, for example, the case of 
an (n+l)-page program which cycles endlessly through the (n+1) 
pages; operate this program in a memory of size n. Clearly, the 
least recently used page is the one about to be referenced, so 
LRU makes the worst possible decision. Similarly, FIFO and FINUFO 
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remove a page just before it is referenced. Only RAND has non- 
zero probability of not making a mistake, and is the best of the 
four. If the working set parameter is T<n, the working set 
never contains the page next to be referenced, and so WS is 
poor in this case too. 



3.5.3. Multiproqrammed Case 

How do programs interact with each other, if at all, under 
each of these strategies? How can the memory demands of one 
program interfere with the execution of another? We can obtain 
answers to these guestions by examining the missing-page prob- 
ability. 

The missing-page probability |x is the probability that a 
process makes a reference to a page not in main memory. In the 
multiprogrammed case, we expect it to be a function of the pro- 
gram size r (r is the number of pages in -the program) , of the 
number n of programs simultaneously resident in main memory, and 
on the main memory size M: 

(3.5.8) (missing-page probability) = |i(n,r,M) 

In the following discussion we assume that locality is a basic 
behavior property. 

Suppose there are n programs in main memory; intuitively 
we expect that if the totality of working sets does not exceed 
the main memory size M, then no program loses its favored pages 
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to the expansion of another . That is, as long as 

n 

(3.5.9) V co i (t,T i ) < M 
i = l 

there will be no interaction among programs, and we expect the 
missing-page probability to be small. But when n exceeds some 
critical number n , the totality of working sets exceeds M, the 
expansion of one program displaces the favored pages of another, 
and so the missing-page probability increases sharply with n. 
Thus , we have 

(3.5.10) nCn^r.M) > p.(n 2 ,r,M) if n 1 > n 2 

This is illustrated in Figure 3-16. In other words, it costs 
more to operate a program in a crowded memory than to operate 
it in a roomy memory. 

If a paging algorithm operates in the range n > n Q , we 
will say it is saturated . 

Next we want to show that the RAND, FIFO, LRU, and FINUFO 
algorithms have the property that 

(3.5.11) |a.(n,r 1 ,M) > |i(n,r 2 ,M) if n>n Q and r.j>r 2 

That is, a large program is more likely to lose pages than a small 
program, when the algorithm .is saturated. Put another way, it 
costs more per page to operate a large program in a crowded mem- 
ory than to operate a small program in a crowded memory. 

To see that this is true for RAND, observe that a large pro- 
gram occupies more space in memory than a small program, and so 



''"Though it may lose favored pages because of foolish decisions 
by the replacement rule; for example, RAND or FIFO. 




Figure 3-16. Behavior of missing-page probability. 
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has more pages as candidates for random selection to choose from. 
To see that this is true for PIPO, observe that a large program 
tends to execute longer than a small one, and is thus more likely 
to be still in execution when FIFO gets around to replacing its 
pages. To see that this is true under LRU, recall that if pro- 
gram P, is bigger than P„, then the interreference intervals 
satisfy x,>x ? — the large programs are the ones that tend to 
reference the least recently used pages. To see that this is 

true for FINUFO is more difficult. If n>n , all the use bits 

o 

are ON, until some program stops for a page wait; since it can 
no longer set its use bits, such a program will tend to lose all 
its pages. Inasmuch as large programs have more pages, a large 
program will suffer more when it enters page wait. 

By definition, a WS algorithm makes the missing-page prob- 
ability independent of n and M, since eq. 3.5.9 is assumed to 
be satisfied. In fact, Theorem 3.3 shows that the missing-page 
probability depends only on T in this case: \(t) = 1-F (t). 

Thus, the RAND, FIFO, LRU, and FINUFO policies result in 
higher costs when the memory is crowded. By avoiding crowding, 
WS results in lower cost. 

If we ask the question: How well does each strategy fare in 
keeping the working set of a process in memory? We again see 
that FIFO and RAND are worse than LRU, which in in turn com- 
parable to FINUFO. If we regard the entire memory contents as 
a large multiprocess computation, the same arguments of the 
preceding section show that WS results in lower missing-page 
probability than LRU. If FINUFO is kept away from saturation, 
it should perform nearly as well as WS. (FINUFO is nearing sat- 
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uration when it cycles once through the memory in a time com- 
parable to the traverse time T. ) 

It might be argued that our comparison of LRU and WS (at 
least in the single program case) is not strictly valid because 
WS operates with surplus memory. That is, a larger memory is 
needed in order to provide buffer space to absorb working set 
expansions. This is quite true of one program only is using 
memory. In the multi programmed case WS shows its superiority; 

1. WS makes programs independent in the sense that the 
expansion of one program cannot displace working set 
pages of another. 

2. When the size of the memory becomes large, the fractional 
requirement for buffer space to absorb working set ex- 
pansions becomes small. This is shown in Chapter 8. 

3. If T is properly chosen, each program operates in the 
vicinity of its optimum cost size (s , in Figure 3-11) 
— thus it is possible to fit more programs cheaply 
into memory under a WS strategy than under any other 
strategy. 



3.5.4. Use of Biased Replacement Rules 

Belady has shown that biasing the FIFO rule (see Section 3.4) 
on the M44/44X computer improved performance significantly [B2]. 
We wish to show that this is true in general: by slowly varying 
the memory share of a program the probability of referencing a 
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missing page is reduced. A WS strategy is still superior be- 
cause it varies the memory share exactly in accordance with need, 
whereas the biased rules do not guarantee that the memory share 
enlarges at the times the program would like to see it enlarged. 

We shall show that biasing the LRU rule improves performance, 
but not to the point of WS . Since LRU is better than RAND or 
FIFO, it follows that biasing RAND or FIFO produces corresponding 
improvements. Since a non-saturated FINUFO rule behaves very 
much like a WS rule, there is little point to biasing it. 

To show biasing improves LRU we shall show biasing increases 
the average value of T for each value of the random variable 
U)(t,x). Suppose u)(t,T) is known; choose t = t such that T 
oo (t,M), where M is the memory size and ix> is the inverse 
function of o)(t,T) with respect to T. Now let the memory size 
be s, and let s vary such that s^M. Since co (t,x) is concave 
(see Figure 3-9), we have from Theorem 3.1 



oj 1 (t,s) > co _1 ( t,M) 



that is, 



T > T 
— O 



Since the average value of i has been increased, the missing- 
page probability \(t) must decrease: 



\(T) < A(T ) 
— o 



Of course, since the memory variation is out of phase with the 
variation of u)(t,T), a pure WS strategy is better. 
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3.6. Thrashing 

It has been observed that even the slightest attempt to 
overuse memory may trigger a total collapse of service efficiency, 
rather than the moderate degradation that might be expected. 
This phenomenon is known as thrashing . We show that thrashing 
is caused by the large value of the traverse time T. 

In this section we write |i. for the missing-page probability. 



3.6.1. The Causes 

Suppose that a certain process has executed for a virtual 
time interval of length V and that the missing-page probability 
\i is constant over this interval. The expected number of page 
waits is then (Vn) , each costing one traverse time T. We define 
the duty factor T((n) to be: 

(elapsed virtual time) 

T|(|l) = ; - 

(elapsed virtual time) + (elapsed page wait time) 

V 1 

(3.6.1) t)(ji) = = 

V + VjiT 1 + (J.T 

r\(.[x) measures the ability of a process to use a processor. 
Figure 3-17 shows Tj(.fi) sketched for five values of T: 
T = 1, 10, 100, 1000, 10000 vtu 
If 1 vtu is taken' to be 1 microsecond, and the rotation of the 
fastest existing rotating auxiliary storage devices is taken to 
be 10 milliseconds, then T=10000 vtu may be regarded as typical 
for existing computer systems. 
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Figure 3-17. Duty factor for various T. 
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The slope of i"|(}i) is 
(3.6.2) V(,) - f^nCH) = " (1 + ^,2 

which for small \i. and T»l, is extremely sensitive to a change 
in fi. It is this extreme sensitivity of T](p,) to changes in n 
for large T that is responsible for thrashing. 

To show how the slightest attempt to over use memory can 
wreck processing efficiency, we perform the following conceptual 
experiment. We imagine a set of (n+1) identical programs, n of 
which are initially operating together in a memory, at the verge 

of saturation (i.e., n=n o in Figure 3-10) with no sharing. Then 

st 
we examine the effect of introducing the (n+1) program. 

Let 1,2,..., (n+1) be this set of (n+1) programs, each of 
size r. Initially, n of them occupy the memory, so that the 
memory size is M=nr. Let \i Q denote the missing-page probability 
under these circumstances, assume M- « 1 > and that ^^o' ls 
reasonable (i.e., it is not true that T)(fi o )«l). Then the 
expected number of busy processors (ignoring the cost of switch- 
ing a processor between processes) is: 



A 

a - 1 v^ = — 



(3.6.3) „ , . ., . , x T 

Now introduce the (n+l) st program. The missing-page probability 
increases to (a +6) and the expected number of busy processors 
becomes 



n+1 



n+1 

(3.6.4) B = T VfV°> = ' , 

<£_i i o i + (u+6)T 
i = l r ° 
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Now, if nr pages consume the memory and we squeeze another size r 
program into memory, the resulting increase in missing-page prob- 
ability is 

r 1 

(3.6.5) 6 = = 

(n+l)r n+1 

since we are assuming that the paging algorithm acquires the 
additional r pages by displacing r pages uniformly from the (n+1) 

programs now resident in memory. The fractional number of busy 

st 
processors after introduction of the (n+1) program is 

R n+1 1 + u. T 

( 3 . 6 . 6 ) 1 = ^ 

a n 1 + (u. +6)T 
■o 



Now, assume T»n»l. We argue that 6 = — -y » |j. . To see this, 
suppose to the contrary that bss\i ; then 

1 1 n + 1 

(3.6.7) T](u) st = — = «1 

1 + 6T 1 + — V n + 1 + T 
n+l 

which contradicts our assumption that, in the non-saturated 
operating region, efficiency is reasonable. Thus, when T»n»l 
and 6»u , it is easy to show that 



(3.6.8) | » ^ + nn Q « 1 



The presence of one additional program has caused a complete 
collapse of service. 

The sharp difference between the two cases at first defies 
the intuition, which might lead us to expect a gradual degrad- 
ation of service. The large value of the traverse time T is the 



■SK-ffep:: 
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root cause. It is interesting to note that Smith [S7] has 
warned of this behavior. 



3.6.2. The Cures 

To cure or prevent thrashing, we must do two things: first, 
we must prevent the missing-page probability y. from fluctuating; 
and second, we must reduce the traverse time T. 

In order to prevent \i from fluctuating, we must be sure 

that the number n of programs residing in main memory satisfies 

n & (Figure 3-16), which is equivalent to the condition that 

n 
o 

(3.6.9) 2j <a L (t t x ± ) < M 

i=l 

where a^Ct,^) is the working set size of program i. In other 
words, there must be space enough in memory for each program's 
working set. This strongly suggests that a working set strategy 
be used. 

In order to get the largest number of programs in memory, 
that is, to maximize n Q , we want to choose t as small as pos- 
sible and yet be sure that W(t,x) contains a process's favored 
pages. Programmers can cooperate in this effort by designing 
algorithms to operate locally on data, consciously keeping the 
working set small and not moving about too rapidly. A program- 
mer is rewarded for this effort, because not only does he achieve 
a high operating duty factor, he also pays less for use of memory. 
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With the FIFO, RAND, and LRU algorithms, it is very diffi- 
cult to ascertain n Q , and therefore difficult to control the 
possible ^-fluctuations. The FINUFO algorithm displays some 
natural tendency to refuse to run more than n programs (the 
extra ones tend to be completely unloaded). 

The problem of reducing the traverse time T is more diffi- 
cult. Recall that T is the expectation of a random variable com- 
posed of queue waits, and mechanical delay factors. Using op- 
timum scheduling techniques on disk and drum auxiliary storage 
devices [C2,D3,F3], together with parallel data channels, we 
can effectively remove all but the mechanical delays from T; 
accordingly, T may be made comparable to a disk arm seek time 
or to half a drum revolution time. To reduce T further would 
require reduction of the rotation tome of the device (for 
example, a 40,000 rpm drum). 

A much more promising solution is to dispense altogether 
with a rotating device as the second level of memory. A three- 
level memory system (Figure 3-18) would be a possible solution, 
where between the main level and the drum we have introduced a 
slow speed bulk core storage. The analysis of Section 3.6.1 
suggests that speed ratios in the order of 1:100 (i.e., T«100 
vtu) between adjacent devices would lead to much less sensitiv- 
ity to traverse times and permit tighter control over the factors 
that cause thrashing. For example: 

level type of memory device access time 

thin film 200 ns. 

1 slow speed core 20 \is . 

2 very high speed drum 2 ms. 
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Figure 3-18. Three-level memory system. 



92 



We cannot overemphasize, however, the importance of a 
sufficient supply of main memory, enough to contain the desired 
number of working sets. Paging is no substitute for real main 
memory. 
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3.7. Survey of the Literature 

Various studies concerning the behavior of paging algor- 
ithms have appeared. The earliest published study, by Pine et 
al. [F6], investigates the effects of demand paging and seriously 
questions whether paging is worthwhile at all. Their experiments, 
as well as the more discerning experiments of Varian and Coffman 
[VI], confirm this: if a program is forced to operate in a space 
smaller than its working set, considerable paging activity may 
seriously interfere with efficiency. The remedy is not to dis- 
miss paging, it is to provide enough main memory. Paging is 
no substitute for real memory. 

Experience with the M44/44X computer has yielded important 
insights into program behavior [01]. Belady and his colleagues, 
noting the concavity of efficiency vs. core-share curves, were 
able to improve efficiency significantly by artificially varying 
a program's core share; this led to the biased replacement rules 
[B2]. Belady has defined a unit of storage allocation, the 
parachor [B2], which is that amount of information that must be 
loaded in main memory for the program to spend no more than half 
its time in page wait. We shall discuss in Chapter 4 the re- 
lation between parachor and working set. Belady has also com- 
pared some of the paging algorithms mathematically [Bl]. His 
most important conclusion in this area is that an ideal replace- 
ment rule should have much of the simplicity of RAND or FIFO 
(for efficiency) and some, though not much accumulation of data 
on past reference patterns. 

Randell and Kuehner [R2] have a good survey of all the 
techniques commonly used to handle multiprogrammed memory alloc- 
ation, ranging from various name space concepts, across look 
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ahead and replacement rules, to problems of optimum page size. 
Oppenheimer and Weizer [02] report on simulations of the 
RCA Spectra 70/46 Time-Sharing Operating System when memory 
allocation is based on a strategy related to working sets. Their 
experiments indicate that this type of allocation markedly im- 
proves performance. 
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3.8. Summary 

Starting from the assumption that locality is a basic pro- 
gram behavior property, we developed the working set model for 
program behavior. Locality is the property that, during any 
virtual time interval, a process favors only a subset of the pages 
available ot it; a working set is a dynamic measure of this set 
of pages. Locality manifests itself as convexity in the working 
size and as concavity in the missing-page probability. Experi- 
mental evidence suggests that locality is a very good assumption. 
There is every reason to beleive that a programmer who keeps 
in mind a working set concept can make this property strong 
in his programs . 

A good performance measure for paging policies is the missing 
page probability, since lower missing-page probabilities result 
in lower memory usage costs. We showed that working set strate- 
gies achieve the lowest missing-page probabilities and operate 
dynamically in a memory space close to that which achieves mini- 
mum cost. 

We also showed that thrashing is directly traceable to the 
large value of the traverse time T. By minimizing the possibility 
of fluctuations in the missing-page probability, a working set 
strategy can markedly decrease sensitivity to thrashing. 

Thus, a working set strategy has three advantages. First, 
it results in lowest costs of operating programs in memory. 
Second, it reduces sensitivity to thrashing. Third, it makes 
programs independent of one another, in the sense that memory 
acquisitions of one program do not interfere with the working 
set holdings of another. Because of this, analysis will be simple. 
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In the next Chapter we refine the working set model and 
derive detailed properties, in the case of no sharing. In 
Chapter 5 we give attention to the case of sharing, when working 
sets overlap. The reader who is interested in the ideas of 
demand and balance should turn directly to Chapter 6. 
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CHAPTER 4 



Further Properties of the Working Set Model 



4.0. Introduction 

Having seen the basic concepts beneath the working set model, 
we are in a position to investigate its properties more thoroughly. 
Here in this chapter we shall refine the working set model in 
the very simplest case: single-process computations with no infor- 
mation sharing . In Chapter 5 we shall investigate the additional 
complications that arise from multiprocess computations and over- 
lapping working sets. 

One of til e properties of a working set memory management 
policy is the statistical independence of working sets: the ex- 
pansion of one working set cannot displace pages of another. 
Because of this, we may analyze the behavior of a single process 
and its working set, and then extend the results in a simple way 
to collections of independent processes with non-overlapping 
working sets. 
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The quantities we shall derive here in this chapter are 
described in the following table. 



quantity symbol 

missing-page probability X(t) 



paging rate 



p(T) 



expected working set size w(x) 



variance of working set size tf,( T ) 



duty factor 



T-sensitivity 



r)(x) 



S(T) 



description 

probability that a process, 
when making an information 
reference, directs the re- 
ference to a page not in 
the working set W(t,x). 

number of pages per unit 
real time re-entering a 
working set W(t,T). 

expected number of pages 
in working set W(t,x). 



fraction of time a running 
or page wait process spends 
running. 

rate of increase of missing 
page probability to decrease 
in x. 



The interreference distribution F (u) plays a key role in the 



x 



analysis, since all these quantities may be expressed in terms 
of P (u). 

We begin by deriving an important result: the mean inter- 
reference interval is also the mean program size. An interesting 
consequence of this is that the expected working set size depends 
only on the interreference distribution. We derive, one by one, 
the quantities listed in the table above; then we show how each 
of these quantities is useful in determining the allowable range 
of T-values to be used; we discuss the problem of predicting 
working set sizes; and finally we discuss how a working set memory 
allocation strategy might be implemented. 
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During the remainder of this chapter we assume: 

1. No sharing . That is, working sets do not overlap. 

2. Single-process computations . 

3. Only working-set pages are in memory . Any page which 
leaves a working set is automatically removed from 
main memory. 

4. Unlimited process or -memory resources . But, since only 

a finite number of processes, each with a finite working 
set, are active, only a finite amount of each resource 
type is in use. 
The third assumption is a worst-case assumption, in the sense 
that a working set strategy would not normally retire a non- 
working-set page until there was need for the space it occupied. 
The fourth assumption allows us to ignore for the time being 
whatever additional problems arise from lack of equipment. We 
shall discuss these problems in Chapter 8, when we examine the 
equipment configuration. 
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4.1. The Relation between Program Sizes and Interreference Intervals 

As before, we define the random variable x to be the virtual 
time interval between successive references to the same page. 
Thus, these intervals x are the interarrival times between re- 
ferences. 

The distribution function is P„(u) = Pr[x<u]. The density 
function is f x (u) = ^— F (u). The mean is 

(4.1.1) x = /* u f x (u) du = /* (1-F x (u)) du 

where this latter integral can be verified by integrating the 
former by parts . The second moment is 

(4.1.2) x 2 = /• u 2 f x (u) du 

2 ~2 -2 — ~~2 

and the variance is a = x - x . We assume both x and x are 

finite (that x is finite is shown shortly in Theorem 4.1). 



The formula is: /y dz = yz - fz dy. In eq. 4.1.1 let y=u, dy=du, 
dz=f (u) du, z=F (u). We integrate from to a, and let a tend 
to infinity when we are done. Then 

/Ju f x (u) du = yzg - fe dy = uF x (u) |J - J^Cu) du - a + /£du 

where we have added and subtracted a = /q^u. Noting that 
uF (u)|* = ocF (a) and regrouping terms, 

/£u f x (u) du = /^(1-F x (u)) du - cc(l-F x (a)) 

To complete the proof, we must show a(l-F (a)) tends to as a 
tends to infinity, when x has finite mean. Now, a(l-F (a))-*0 

if and only if ( 1-F (oc) )/(l/a) — ►O if and only if (L'Hospital's 

2 — 

Rule) f (oc)/(l/<x )-*>0 which is exactly the condition that x be 

finite. 
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By a program Z(t) at time t we mean the set of pages toward 
which a process directs its information references . The program 
size is z(t) = jz(t)( . We assume that z(t) is a stationary 
stochastic process, so that we may write z instead of z(t), and 



z instead of z(t) . The program size distribution is F (u), 

z 

across the ensemble of all programs. 

We must be careful not to confuse a program Z(t) with a work- 
ing set W(t,x). A working set W(t,T) is related to a program 
Z ( t ) , thus : 

(4.1.3) W(t,T) C (J z(s) = Z(t,T) 

se(t-T,t) 

where (t-T,t) is a virtual time interval. Because of our assump- 
tion of locality, we assume also that the content of Z(t) does 
not change appreciably over intervals of length t, so that the 
size of Z(t,T) is described by the random variable z. Thus, 
we assume 

(4.1.4) |z(t,T)| = z 

Recalling the definition of the working set size u>(t,T), we have 

(4.1.5) w(t,T) < z 
and hence 

(4.1.6) w(t) < i 



A more detailed view would note that a program contains the 
instruction stream that directs the activity of a process, 
together with the data used. However, we do not require this 
much detail in our analysis. 
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Theorem 4.1 . Let x be the interref erence intervals, with mean x. 
Let z be the program sizes, with mean z. Then x = z. 

Proof : Refer to Figure 4-1, where we have shown a set of z 
pages consitituing a certain program. Let 



(4.1. 7) 


Pi 


then, 






z 


(4.1.8) 


I 




i = 


Now, let 




(4.1.9) 


(x 



p Tprocess references page i , | 
(_when program size is z _| 



C interref erence interval to page i ,~~ I 
when program size is z. _J 



In a sequence of independent trials, with Pr [success] =p. , the 
expected waiting time until success is 1/p. . Thus, 

(4.1.10) (xlz) . = — 
' X Pi 

For the entire program, 



(4.1.11) 



(x l z) = 2 (x|z) i P i = E P~ P i 



i=l i=l x 



Now, taking the expectation on z, 

_ — z 

(4.1.12) x - (x|z) = : 



QED. 



Corollary 4.1 . Let Z, and Z„ be programs; let x be the inter- 
reference intervals to Z, and x„ be the interref erence in- 
tervals to Z ? . If Z, is bigger than Z~, then x > x_. 

Proof : Let z. = Z. and z ? = Z ? . We have 



x = 7x~j"z~7T z^ > z- (x |z 2 ) = x 2 
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pages 




stream of references 
at rate 1 per vtu 



Figure 4-1. A simple program model, 
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Note, however, that we cannot also claim F (u)=F (u); i.e., 

z x 

that the interreference distribution is also the program size 

distribution. In a given computer system, there will 

be some largest program size z . It is unreasonable to assume 

(4.1.13) Pr[x > z ] =0 

o 

because even the largest program may contain pages it uses only 

2 
rarely. Thus, we expect that the variance a of interreference 

2 
intervals will be greater than the variance cr of program size. 

Since F (u) and F (u) describe the ensemble of all programs, 
X z 

we can make no claim that any given program is reliably described 

by F (u) or F (u) . We can, however, claim that a balance set 
X z 

of programs, being large, is representative of the ensemble; thus 
the quantities we shall derive in the next sections, expressed 
in terms of F (u), are applicable to balance sets of programs. 

A question which may have occurred to the reader is: How 
does page size (number of words to a page) enter into our con- 
siderations? Page size is accounted for implicitly in the defin- 
itions of the interreference intervals x and the traverse time T. 
On the one hand, halving the page size makes the same program 
comprise twice as many pages; from Theorem 4.1, we see that the 
interreference intervals become twice as long. That is, smaller 
pages are referenced less often. On the other hand, the traverse 
time T contains a component due to page transmission time, which 
depends on page size. 

Thus, provided that pages are sufficiently small that work- 
ing sets contain several pages, all our results are independent 
of the page size. 
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4.2. Missing-Page Probability 

We showed in Theorem 3.3 that the missing-page probability 
depends on F (u): 

(4.2.1) \(x) = 1-P x (t) 

and is just the probability that the page referenced satisfies 
x>T. The next theorem shows that \(x) may be regarded as the 
rate, in virtual time, at which pages re-enter the working set 
W(t,x); that is, l/\(x) is the expected virtual time interval 
between references to a page not in W(t,x). See Figure 4-2. 

Theorem 4.2 . Let \(x)=l-F (x) be the missing-page probability. 
Then \(x) is also the number of pages per unit virtual 
time re-entering W(t,x). 

Proof: Let Z(t) be a program. We consider first the behavior 
of a typical page in in 2(t) and then obtain the behavior of Z(t) 
by summing the behaviors of its component pages. 

Let {t ) >Q be a sequence of virtual time instants at which 
references to page i occur (Figure 4-3). The n interreference 
interval is 

(4.2.2) x = t -t , 

n n n-1 



Now 



, we assume the interreference intervals {x j ^. are statis- 



tically independent , so that for all n>l: 



(4.2.3) f (u) = f x (u) 
n 



A re-entry point is a reference instant that finds the page not 
in W(t,x): at such an instant the page re-enters W(t,x). Observe 



This assumption does not contradict the ^assumption of locality. 
Locality implies only that the favored pages have short inter- 
reference intervals . 
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pages entering W(t,T) 
for the first time 




pages re-entering 
working set 



\(t) = vt rate 

p(f) = real time rate 



pages leaving W(t,t) 
for the last time 



Figure 4-2. Illustrating the meaning of re-entry rates, 
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that t is a re-entry instant if and only if x >T, independent 
n " 

of other reference instants. Suppose t is a re-entry; we are 
interested in n , the probability 

(4.2. 4 ) 7t = Pr[t is first re-entry after t Q ] 
The probabilities {it } , are distributed geometrically: 

(4.2.5) n = (F Y (T)) n_1 (l-F Y (T)) 

That is, t is the first re-entry after t if and only if each 
of the intervals x , ...,x , satisfies x^T, . . . » x n _i< T and x n 
satisfies x >T. The expected number of references until the 



re-entry is 
(4.2.6) 



n = } n tc = 

£i 1 - f x (t) 



Each interreference interval x is of expected length x, so the 
expected time between re-entries is 

x 



(4.2. 7) n x = 



1 - P x (t) 



Let us define the virtual time re-entry rate X. (t) for page i 

in Z(t) to be: 

1 - F (T) 

(4.2.8) \.(T) = ^ 

x 

Next, suppose Z(t) contains z pages. Given z, the total re-entry 

rate for Z(t) is: 



1 - f x (t) 
z 



(4.2.9) (\(T)| z) = V \ ± (T) = 

i = l 

Then, taking the expectation on z, 

„ 1 - F (t) 

(4.2.10) X(t) = (\(x)|z)" = S ^ 
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But, since z = x (Theorem 4.1; , we obtain finally 



(4.2.11) \(T) = 1 - F (T) 

x 



QED. 



Since F (.%) is a non-decreasing function of t, \(t) is a non- 
increasing function of T. Thus, decreasing T can never result 
in a decrease in the missing-page probability. 
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4.3. Paging Rate 

Assuming that the memory management mechansim guarantees 
that a page resides in main memory if and only if it is in a work- 
ing set, every re-entry point in virtual time corresponds to a 
page wait in real time. Thus, every page re-entering W(t,T) must 
be recalled from auxiliary memory, and contributes to page traffic. 

Define the paging rate p(x) to be the number of pages per 
unit real time re-entering the working set W(t,T). That is, 
l/p(x) is the expected real time between re-entries. See Figure 4-2, 

Theorem 4.3 . Let p(x) be the paging rate. Then 

\(T) 

(4.3.1) p(T) = 

1 + \(t)T 

where T is the traverse time, and \(t) is the missing-page 
probability. 

Proof : The expected virtual time between re-entries is 1/\(t), 
by Theorem 4.2. Then the expected real time between re-entries 

is 

(4.3.2) "TTT = TTTT + T 
p(TJ \(T) 



so that 



p(T) 



\(T) 



1 + \(T)T 

QED. 



Observe that p(x) may be interpreted as 

number of re-entries 



(4.3.3) p(T) 



elapsed real time 



Ill 



because, in a virtual time interval of length V, there are V\(x) 
re-entires^ each costs one traverse time T, so the elapsed real 
time «ust be (V + V\(t)T). 

With a balanced memory (i.e., the totality of working sets 
constituting the balance set B does not exceed memory) , process j 
in the balance set B contributes pj(x) to the total returning 
page traffic Y(t) : 

(4.3.4) T(t) = V p.(T) 

jeB 

so that Y(t) estimates the total traffic of pages being recalled 
to memory, and is therefore a lower bound on the capacity required 
of the channel bridging the two levels of memory. 

The rate Y(t) does not include page traffic resulting from: 

1. computations entering and leaving the balance set B; 

2. pages being referenced for the first or last time by 
processes in B (see Figure 4-2). 

Given the rate at which each of these occurs, one can estimate 
the true total paging rate. These adjustments are straightfor- 
ward, so we shall not pursue the matter further. 

We must emphasize that the rates p(T) and Y(t) are estimates 
of steady-state behavior, under the assumptions of Section 4.1. 
The important point is : starting from the interreference dis- 
tribution F (u) and the definition of W(t,x), it is possible to 
estimate these rates. 
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4.4. Working Set Size 

Let Z(t) be a program, and let z(t) be the random variable 
of the size of Z(t). Starting from the assumption that z(t)=z 
is a stationary stochastic process, we shall derive expressions 
for the mean and variance of working set size w(t,T). The 
importance of these results is that a program's main memory 
requirement is completely determined by its page interreference 
activity. 



Theorem 4.4 . Let w(t) = u)(t,x) be the expected working set size. 
Then 

(4.4.1) w(T) = / T (l-F (u)) du = / T \(u) du 

'ox J o 

where \(u) is the .missing page probability. 

Proof : Refer to Figure 4-4 , where we have shown an interval in 
virtual time for a typical page in W(t,x). Define the random 
variable 

i a a 2) v [length of the interreference interval"] 

y ^containing the time instant t J 

Thus, if we choose a point t at random on the virtual time axis, 
y is the length of the interval in which t lies. The density 
function f (u) for y is not the same as that of the interrefer- 
ence intervals x because, even though long intervals are less 
likely than short intervals, they occupy a larger fraction of 
the virtual time axis. A little thought should convince the 
reader that the probability that t is contained in an interval 
of length u is just the fraction of the time axis occupied by 
intervals of length u: 

u f (u) 
(4.4.3) f ( U ) = -^— * 
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For a complete discussion of this property, see Feller [F2,Vol.2, 
p.lOff]. Let i denote a typical page in the program Z(t). 
Define the binary random variable 

1 if ieW(t,T) 

otherwise 

Refer again to Figure 4-4, and use t=0 as the left end of the 
interval. Suppose y=u; then 



(4.4.4) 



Yi 



Pr[ Y -^0|y=uj " Pr[u>T and te(T,u)] 
Pr[ Yi -0] = / Pr[ Yi =0|y-u] f (u) du 



(4.4.5) Pr[y.=0] 



u>T 



f Pr[u>T and te(T.u)] f (u) du 

U>T y 



Now, t may fall randomly on the interval (0,y), so 



(4.4.6) 
then 



Pr[u>'U and te(T,u)] 



u-x 



(4.4.7) Pr[ Yi -0] - /"^f y (u) du 



(u-T) f r (u) du 



carrying out this integration (by parts) wc obtain 
(4.4.8) 



Pr[ Y .-0] - 1 



— f % X(u) du 
— • o 



where \(u)-l-F (u). Then 
x 



Pr[y. =-1] 



(4.4.9) 

Now, observe that 

(4.4.10) u)(t,T) 



— / T \(u) du 
- J o 



E r ± 



iez(t) 

Suppose |z(t)| = z. Then, given z, the expected working set size is 

z z 

7" -- J> Pr[ Yi =l] = zPr[ Y± = l] 



(4.4.11) ( w ( T ) z ) = a) ( t , T ) 



i = l 



i = l 
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Then, taking expectation on z, 



Z r"C 



(4.4.12) w(t) - (w(T)|z)" - ~r~/o ' V(u) dU 



Finally, from Theorem 4.1 we have z - x, so that 



w( t) - f \(u) du 
• o 



QED. 



We should verify that the properties of Theorem 3.3 are 
satisfied : 

1 . W ( T ) < 'L 

2. w(0) - 

3. w( T + s ) 2l w ( T ) s >° 

4 . w ( t) convex. 

Since \(u)=l-F (u)<l, 
x — ' 

(4.4.13) w(t) - / T \(u) du < / T du - T 

" o — J o 

and properties 1 and 2 are satisfied. Since \(u)>0, 

(4.4.14) w(T + s) - ,' T + S X(u) du > / T A.(u) du = w(x) 

J o — •" o 

and property 3 is satisfied. To verify property 4, we show 
that the second derivative of w(t) is non-positive: 



jr w(T) -= \(T) = 1 - F (T) 
dT x 

(4.4.15) 

^-^r w(T) = -f (T) < since f (t) > 0. 
,^2 x — x — 

QT 

Comparing the theorem statement with eq. 4.1.1, we observe 
also that 



(4.4.16) xlm W (T) - x = z 
T — ► oo 
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And, since z - x, 



2 2, 



2,, x , n x 2,, w (x) 

(4.4.22) oj (t,x) = w(T) + _ 2 w (x) - — ^ 

X X 

then 

-2 

? o 2 2 x 2 

(4.4.23) a (x) - a) (t,x) - w (x) = co (t,x) —^-w (x) 

a) x 

/ \ x -x 2 , „. > w ( T) 
W ( T ) + ^7j— W ( T ) - 

x x 

2 

, . w ( t) , 2 -, 
w(x) + — ^2~ (cr -x) 

X 



QED. 



2 

Corollary 4.5. The variance (x) is lower-bounded by 

' to 

(4.4.24) a 2 (x) > w(T)(l- ii ^- 



2 2 

Proof ; For any random variable x, a x > , so put 0^=0 into the 



2 
expression above for a (x). 



Observe, from eq. 4.4.16 and 4.4.17, that 

o 
and that a (x) attains a maximum value for some x>0. 

0) 



QED. 
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4.5. Duty Factor 

The duty factor t)(t) of a process is the fraction of time 
it is able to spend computing: 

(elapsed virtual time) 



(4.5.1) T|(T) 



(elapsed virtual time) + (elapsed page wait time) 
t)(t) measures the ability of a process to use a processor. 

Theorem 4.6 . The duty factor t|(t) is given by 

1 



(4.5.2) r)(T) = 



1 + X(T)T 

where \(t) is the missing-page probability, and T is the 
traverse time. 

Proof : Suppose the process has executed for V vtu, with no 
interruptions other than page waits. The time spent in page wait 
is then (V\(t)T) and so 

V 1 



T)(T) 



V + V\(T)T 1 + \(T)T 

QED. 



The duty factor has already appeared in Section 3.6, on thrashing. 

We may interpret t)(t) as the probability that, if we look 
at a process at some random time, we find it running. 
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4.6. T-sensitivity 

It is useful to define a sensitivity function s(t) that 
measures how sensitive is the re-entry rate \(x) to changes in t. 
We define the T-sensitivity s(x) of a working set W(t,T) to be 

(4.6.1) s(T) - - ^L- Mt) = f (T) 

That is, if T is decreased by dt, the resulting increase in 
re-entries to W(t,T) is s(t) dx. It is obvious that s(x)>0: 
reducing t can never reduce the page traffic. 

Observe that s(t) is the negative second derivative of w(t), 
and is therefore a measure of the convexity of w(t). 

s(x) may be useful in deciding how small a value of x to 

choose. If f (x) has the shape shown in Figure 4-5, curve A, 

a good choice for x is x^x since t>t has little effect on 

reducing s(x). If f (x) has the shape of curve B we should have 
J x 

to choose t=x t ,>x a in order to have the same x-sensitivity. There 

is good reason to believe that in practice f (x) is approximately 

hyperexponential , in which case curve A is more representative 
than curve B. 
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Figure 4-5. Using s(t) to choose x. 
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4.7. Choosing T 

Ideally, the working set parameter T should be chosen as 
small as possible and yet assure that the working set W (t,x) of 
process p contains p's favored pages. In principle, then, t should 
be variable, from process to process and from time to time. 

In practice, it will be necessary to choose non-ideal values 
for x, because the optimum value for T may be indeterminable, 
or because too much mechanism may be needed either to decide 
on the required value of T or else to vary T dynamically as 
required. Thus, system parameters, as well as program para- 
meters, will play roles in choosing t. 

Should t be too small, the favored pages of a process will 
be removed, resulting in high missing-page probability, high 
memory-usage costs, high page traffic, and low efficiency (i.e., 
duty factors). Should i be too large, pages may remain in mem- 
ory long after last being used, thus wasting memory and again 
resulting in high memory-usage costs. 

We shall attempt to clarify the nature of the tradeoffs 
among all these factors. 

Strange as it may seem, there may be a worst value for T. 
Suppose the process in question has executed for V vtu. The 
expected number of page waits is V\(x), the expected time spent 
in page wait is V\(t)T, and the expected elapsed real time is 

(4.7.1) V + V\(T)T = V (1 + \(T)T) 

During this interval V, the expected working set size is w(t), 
so that the expected cost per unit virtual time is 

w(T) V (1 + \(T)T) 

(4.7.2) H(T) = - w(T)(l + X(T)T) 
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In Figure 4-6 we have sketched w(t), (1+\(t)T), and the product 

H(x). It is clear that H(x) attains a maximum for some T >0, 

o ' 

if (1+T)>x. If T»l it is not hard to see that t is very small, 
and the value of T chosen to permit inclusion of at least the 
favored pages will satisfy x»t q . There are values of T satis- 
fying (1+T)<x such that H(x) has no maximum for finite T, in 
which case we need not worry about a worst value of T. 

Note that H(t) has a maximum at T , whereas the cost function 
G(s) of Section 3.5.1, as a function of memory space s, has a 
minimum. The apparent discrepancy is resolved if we note that 
H(t) cannot account for a memory holding larger than the expected 
working set size w(t), whereas G(s) can. The functions G(s) 
and H(t) are not the same cost function. 

The remaining tradeoff issues fall into classes: those that 
depend on the behavior of the program, and those that depend on 
system requirements. The program-dependent considerations are: 
!• Hard vs. Soft Programs (cf. Section 3.3, Figure 3-9). 
Choose T as small as possible, yet allow W(t,T) to 
contain the favored pages . A hard program has a well- 
defined minimum value of t, whereas a soft program does 
not. 
2. T-sensitivitv (cf. Section 4.6). T can be chosen so 
that s(T) is at some desired level, or that T is at 
the start of a flat region of the s(t) curve. 
The system-dependent considerations are: 

1. Paging rate (cf. Sections 4.2, 4.3). x can be chosen 
so that the virtual time between page faults is compar- 
able to T; that is so that 1/\(t) - T. This is 



123 



1 + T 




Figure 4-6. Cost per unit virtual time H(t) 
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equivalent to the condition 



p(T) 



1 



1 + \(T)T 2T 



2. Duty factor (cf. Section 4.5). t can be chosen so that 
a given duty factor n(T) is attained for each comput- 
ation. 
Of course, t should never be chosen less than whatever is re- 
quired to satisfy the program-dependent criteria, conditions 1 
and 2. In most contemporary systems T is so large (Tw 10000 vtu 
= 10 ms.) that T must be chosen to satisy the system-dependent 
criteria, conditions 3 and 4; this will generally cause t to be 
an order of magnitude or more greater than program-dependent 
criteria would require. 

Ideally we should like to have the flexibility to choose t 
according to the program-dependet criteria, without regard to 
the system-dependent criteria. It should be clear that this is 
achievable only when T becomes much smaller than is normal in 
contemporary systems; for example, T less than 100 vtu. The 
us* of bulk core storage or some other non-rotatino device for 
the second level of memory can achieve this. We shall return 
to these issues in Chapter 8. 
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4.8. Prediction 

On several occasions we have noted that a working set 
W(t,T) is a reliable prediction of the working set W(t+a,T), if 
a is not too large; similarly the working set size a)(t,x) is 
a reliable prediction of the working set size co(t+a,x),- if a 
is not too large. Without going into great detail we want to 
indicate how these ideas can be made more precise. 

The prediction problem for working set sizes is : 
given: u>(t,T) for tel 
want to estimate: oj(t+a,x) for oc#I 

the estimate is to be: flj(t+oc,x) = G(a)(s,x)) for sel 
Here, I is a set of time points at which the value of co(t,x) is 
known. This set I could consist of one or more distinct points, 
of a time interval (t-^t^, or even the entire time history since 
t=0. The transformation G is to be chosen so that w(t+oc,x) is 
an optimum (in terms of a given criterion) estimate of co(t+oc,x). 

We assume that co(t,x) is a stationary stochastic process; 
hence we can write its expectation independent of time: 



(4.8.1) w(t) = oj( t ,x) 

and we can define the autocorrelation function between ci>(t,x) 
and oo(t+u,x) to be 



(4.8.2) R(u,x) = w(t,x) co(t+u,x) 

depending only on the separation u of the two times. 

The most common form of prediction is least mean square 
prediction , used because it is particularly easy to analyze. 
Define the error of the estimate to be 
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thus , 

(4.8.3) e(oo) = u)(t+a,T) - G(co(u,t)) uel 

The problem is to choose the transformation G such that the mean 

2 
square error, e (a) , is minimum. Clearly, the smallest mean 

sqaure error can be obtained if we impose no restrictions on G 

(non-linear mean square prediction). This, however, leads to 

practical and analytic difficulties, so G is usually restricted 

to be a linear operator. When G is a linear operator, the mean 

square error e (a) is minimum if and only if the error e(a) is 

orthogonal to all the given data (this result is well known; see, 

for example, reference [PI, p. 389]); that is, 



(4.8.4) Lu>(t+oc,T)-G(u)(u,T)) J oo(v,t) u = for each vsl 

The most convenient linear operator is a linear combination 
of a finite number of data. That is, u)(t,T) is known at the 
time instants t ,...,t , and the estimate is to be a linear 
combination 

(4.8.5) U>(t+OC,T) = A,(i)(t.,T)+...+A u(t ,T)+A n 

11 n n' n+1 

The constants A 1 >'"» A n+1 must be chosen to satisfy eq. 4.8.4; 
that is, so that 



(4.8.6) [a)(t+'a,T)-(A 1 a>(t 1 ,T)+.. ,+A co(t ,t)+a + ,)] oo(t.,t) = 
for i=l, . . . ,n, and 



(4.8.7) [0)(t+a,T)-(A 1 0)(t 1 ,T)+...+A a)(t ,T)+A ,)] A ~ = 

11 n n' n+1 n+1 

If one expands these for each i, one obtains (n+1) equations in 
(n+1) unknowns (the A ± ) with coefficients of the form 
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(4.8.8) o)Ct.,T) co(t.,x) = R(t.-t.,T) 

which follows from eq. 4.8.2. 

We hope to have indicated with this overview how the prob- 
lem of predicting working set sizes might be made more precise, 
and how an error can be determined for a given estimate and time 
separation a. We refer the reader to the literature for further 
detail [PI, p. 385ff.]. 
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4.9. Example 

It is interesting to examine the results of the previous 
sections in the case of exponentially distributed interref erence 

intervals : 

F (u) = 1 - e" Bu B = — 
x - 

X 

we have, for the major expressions: 



Name symbol Result, exponential case 

missing-page probability \(t) e 

mean vt interval 1 _Bt 
between re-entries 



mean real time 

interval between _ , ^ e + T 

re-entries 

duty factor 



symbol 


\(T) 


1 


\(T) 


1 


p(T) 


T)(T) 



1 + Te" BT 
expected working set size w(t) x(l-e ) 



Now, suppose we have chosen t so that 
— i- - T 

then 

T = x In T 



For this choice of i we have 



P(T) = ^ 



11(T) = ^ 



w(T) = X i^Y~) 



129 



4.10. Working Sets and Parachors 

Belady has defined a unit of storage allocation, the para- 
chor [B2], which is that amount of information that must be in 
main memory so that a program spends no more than half its time 
in page wait. If we choose t so that 



\(T) 



we find that the duty factor is ti(t) = ■£. Hence the expected 
working set size for this value of x corresponds to one parachor. 
In the exponential case, this is 

w(T) = x (2=i) 

Allocating one parachor to each program is the same as allocating 
enough space for its expected working set size. The parachor 
is a static unit of allocation, whereas the working set size 
a)(t,T) is a dynamic unit of allocation. Our results in Chapter 3 
show that working set strategies should perform better than para- 
chor strategies (a parachor strategy is one that runs a process 
if and only if there is at least one uncommitted parachor of 
main memory) . 
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4.11. Implementation of Working Set Memory Management 

According to our definition, W(t,t) is the set of pages 
a process has referenced within the last T vtu of its execution. 
This suggests that memory management can be controlled with hard- 
ware mechanisms, by associating with each page-block of main 
memory a timer. Whenever a page is referenced, its timer is set 
to x and begins to run down; if the timer succeeds in running 
down, a flag is set to mark to page for removal from main memory 
whenever the space is needed. 

Unfortunately matters are not so simple. According to the 
definition of W(t,t), the timers must run down in virtual time. 
Virtual time coincides with real time only when the process is 
running. More precisely, the timer behavior should be as follows, 
for each process state: 

1. running . A timer may run down in real time. 

2. page wait . Since the process is temporarily suspended, 
all timers on its working set pages must be stopped, 
else they amy run down and working set pages may be 
removed during a page wait. 

3. ready and blocked . If a process is pre-empted by the 
operating system, or blocks, its timers may continue to 
run down in real time; then, within T vtu, the memory 
it formerly occupied will be freed. 

We can see that it is the page wait state that gives the trouble. 
Whenever a process enters page wait its page timers must stop 
until the new page is acquired. For other process states, the 
page timers may run in real time. Therefore we shall associate 
with each page-block in main memory the name of the process 
that has most recently referenced it; when the process enters 
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page wait, all its pages can have their timers suspended. 

The following procedure is useful in a software as well as 
a hardware implementation, and is therefore potentially applic- 
able in contemporary systems. The procedure we propose here 
samples use bits associated with each page; these use bits may 
be part of a page table entry or part of a hardware register. 
Sampling occurs at intervals of a vtu, a being called the 
sampling interval , where a=T/K, and K is an integer constant 
chosen to make the sampling intervals as fine as desired (K=2 
or 3 should be sufficient). On the basis of page references 
during each of the last K sampling intervals, the working set 
W(t,Ka) can be determined. 

There is a sequence of use bits u , u,,...,u associated 
with each page. Whenever a reference occurs, l-*u . At the 
end of each sampling interval, the bit pattern contained in 
u o' u l''""' u K is shifted one position, a enters u , and u 
is discarded: 

U K-1~* U K 

* 

U o— u l 

0-»u 
o 

Then the logical sum U of the use bits 

U = u + u. + . . . + u T , 
o 1 K 

is U=l if and only if the page in question has been referenced 
during the last K sampling intervals; of all the pages associated 
with a process, those with U=l constitute its working set W(t,Kcr). 
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Figure 4-7 shows how this idea might be implemented in 
hardware. If process j is currently using the page, the ir-field 
of the page register contains an identifier to j . The PW bit is 
if and only if j is in page wait. The PT-field points to the 
page table entry designating this page. The cr-bus is pulsed 
once every O vtu; these pulses cause a shift in the use bits 
if and only if PW=1 (the process is not in page wait). Whenever 
the logical sum U of use bits becomes 0, a mechanism (not shown) 
may (not must) remove the page from main memory; this mechanism 
will dispatch the page to auxiliary memory (unless it has not 
been modified and there is a spare copy already in auxiliary 
memory) , and then (using PT) find the page table entry for this 
page and set the in-core bit OFF. All this is done without 
troubling the operating system. 

This mechanism maintains a count of the working set size 
for each process as follows. Whenever a fresh page of process 
7i is loaded by the operating system (in response to a page fault), 
increment a counter for the process Tt. Whenever the logical 
sum U of use bits becomes for some page marked as belonging 
to process Tt, decrement the counter for process Tt. 

It is interesting to note that Tr=Kcr may be varied if 
desired by varying a. The operating system thus has control over 
the current value of T. 

This basic scheme can also be realized in software, as 
suggested by Figure 4-8. All processes in the running state 
are identified in the running list . Upcjn entry to the running 
state, process i is assigned some quantum q . . A process cycles 
through the list, receiving a burst a (a is the sampling interval) 
at each pass; the quantity Yi records its time used. There is 
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process 
name 



use bits 
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page wait 
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not in page wait 



reference 



cf-bus, carrying stream of 
pulses, once each a vtu 



MAIN MEMORY 



Figure 4-7. Hardware implementation of memory management. 



134 



Yi < <3 ± 



[page wait] 



page wait 
T 



Yi ^i 



[quantum runout] 



to 

ready 

state 



[burst over] 



run on a 
processor 
for burst a 



-^ [quit] 



[blocked] 



to 

blocked 
"^ state 



I T i ^ T i I 



7777777 



ii< 



Checker 



RUNNING 
LIST 



-r°"n 



from ready state 
with quantum q. 



Figure 4-8. Software implementation of memory management. 
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a special process, the checker ; whenever run, the checker looks 
at the page tables of processes run since the last time it was 
run, and performs the use-bit shift discussed above (the use 
bits u ,u,,...,u are stored in the page tables, so the ti, PW, 
and PT fields of Figure 4-7 are no longer necessary). 

Associated with each process is a counter w. giving its 
current working set size. At each page fault for process i, 
w. is increased by one. If the checker observes a page leave 
the working set of process i, w. is decreased by one. 

It should be clear that, if the length of the running list 
is n, the checker samples page use bits only every no" seconds, 
not every O seconds. 

This implementation is also discussed in reference [D4] . 
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4.12. Summary 

We refined the working set model, deriving expressions for 
the missing-page probability \(t), the paging rate p(x), the 
expected working set size w(t) , the variance of the working 
set size 2 (t), the duty factor ti(t), and the T-sensitivity s(t). 
Each of these depends on the interreference distribution P x (u), 
and some of them depend also on the traverse time T. We showed 
how each of these plays a role in selecting a value for t. 

We discussed the problem of prediction, showing general 
methods whereby errors may be determined precisely for a given 
mode of estimation. 

We discussed implementation of working set memory allocation 
strategies, both for hardware and software. 

All this was done in the absence of sharing. In the next 
chapter we investigate the effects of sharing and show quantita- 
tively that great benefits are attainable. 
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CHAPTER 5 



Multiprocess Information Sharing 



5.0. Introduction 

In .this chapter we complete the characterization of the 
working set model, by investigating the effects of sharing. 
Intuition already tells us that sharing should produce an all- 
around inprovement. Our purpose here is to give quantitative 
justification to this well known premise. 

Under the existing definition, working sets will overlap 
when their processes share information. This complicates the 
problem of charging for main memory usage of shared information, 
because the number of overlaps among shared working sets (there 
can be as many as (2 -1) overlaps among n working sets), their 
sizes, and their contents, may be unknown or at best exceedingly 
difficult to determine. A minor modification of the definition 
makes working sets disjoint, thereby relieving these difficulties, 
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We consider a simple conceptual experiment: n independent 
processes referencing n identical programs compared to n inde- 
pendent processes referencing one program. We derive expressions 
for working set size, missinq-page probability, paging rate, and 
duty factor. We show that sharing produces improvement in each 
of these quantities. 

The discussion here in this chapter is not intended to 
solve the problems of sharing information. We only hope to shed 
light on the difficulties of the problem and to give insights 
into possible solutions. 
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5.1. Sharing 

5.1.1. General Aspects 

The smallest unit of information that can be shared is, 
from a process's point of view, a segment , because the protection 
mechanism operates on a segment level. The smallest unit of in- 
formation that can be shared is, from the system's point of view, 
a page , because memory allocation is handled on a page level. 

We follow Arden's suggestions for program structure [Al] . 
If a segment is shared, there will be an entry for it in the 
segment table of each participating process. Each such entry need 
not assign the same name to the segment. Each such entry, how- 
ever, points to the same page table. Thus, each physical segment 
has exactly one page table describing it. 

The problem of charging participants for the use of shared 
information can be handled at two levels : shared information in 
main memory, and shared information not in main memory. 

Ideally, we should like each participant in sharing of in- 
formation which resides in main memory to be charged in accor- 
dance with his degree of participation. Even though this may 
not be easy to implement, an extension (Section 5.1.2) of the 
working set concept can give insights into how this might be done, 
and how an implementation can approximate this ideal. 

When working sets overlap, the existing working-set defin- 
ition leads to the following difficulty. Suppose computation C 
contains two processes, designated 1 and 2, which are sharing 
information. Then 

W x (t,T) W 2 (t,T) t 
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where is the empty set. Then the joint working set for 
computation C is 

W c (t,T) = W 1 (t,x) U W 2 (t,T) 

and the working set size of C is 

C0 c (t,T) < 0) 1 (t,T) + U) 2 (t,T) 

Thus, measuring the individual working set sizes of the compon- 
ent processes of a computation will lead to an overestimate of 
the true joint working set size. When there is much sharing, 
working sets will be very nearly coincident; summing the sizes 
of each process's working set will grossly overestimate the 
true joint working set size. This can seriously complicate the 
problem of attributing memory-usage charges to the participants. 

In the next section we shall introduce an alternative work- 
ing set definition that facilitates the accounting and billing 
procedures by making working set always disjoint. 

The method most frequently proposed for handling charges 
on the non-main-memory shared information is based on a concept 
of ownership . Each segment is assigned exactly one owner. Any- 
one wishing to use another's segment must make arrangements to 
do so with the owner. The owner is charged for use of the seg- 
ment, regardless of who is actually using it; he is in turn 
paid royalties by borrowers, these fees fixed to defray those 
expenses charged to him because borrowers have used his segment . 



The owner method of charging for sharing very much resembles 
copyrights . A similar problem is the so-called proprietary 
software problem, in which a firm or user may lease programs 
to other firms or users . 
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One of the chief motivations for this method is simplicity of 
implementation: because an arbitrarily large and unpredictably 
varying number of processes may wish to share a single segment, 
it can become unbelievably complex to keep track of, and attri- 
bute charges to, every participant. If this method is used for 
information shared in main memory, the following inequity will 
result. Two users sharing the same segment both pay the same 
fee to the owner (there is no way to determine in advance how 
much a given user will use it); yet one user may use it spar- 
ingly, the other heavily. Thus, costs of sharing may not be 
distributed fairly. 

It is apparent that, if the owner method is used, it must 
be augmented in order to distribute main memory costs more 
equitably. 



5.1.2. Refinement of The Working Set Definition 

The basic idea we use here is : rather than associate with 

each process the pages it has most recently referenced, we 

associate each page with the process that has most recently 

referenced it. 

Page i belongs to the working set W (t,x) of process p 

ir 

if and only if: 

1. p has referenced i most recently at time s in its 
virtual time interval (t-T,t). 

and 

2. no other process has referenced i in p's virtual time 
interval (t-s,t). 



Thus , 



,,,.„-•> {■ the most recent reference to l originated \ 
p ' ^ from process p, in p's vt interval it — Ti,t; J 
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This definition has two consequences: 

1. Disjoint working sets . A page is in at most one working 

set. Therefore if u (t,x) is the working set size for each 
process p, and if Q is any collection of processes, 
then the size of their joint working set is 

(0 Q Ct,T) = ^ 0> p (t,T) 

psQ 
We may therefore compute the memory demand of any 
collection of processes simply by adding their working 

set sizes. 
2. Fair distribution of costs . Suppose processes Pl ,...,P r 
have been sharing page i, independently, for some in- 
terval of time, and let nj denote the number of refer- 
ences process Pj made to page i. Then, on the average, 
page i spends a fraction 



n . 



(5.1.1) f j n 1 +...+n r 

of its time in the working set W (t,T), and has con- 
tributed f . to the size of p ^ ' s working set. Thus, 
a participant is charged in accordance with his degree 
of participation. 
This last relation, eq. 5.1.1, holds only if Pl ,...,P r behave 
independently. If the shared information is modifiable and 
protected by interlocks, then the likelihood of correlation is 
very high. In general, there is no easy way to determine how 
an interlocked, modifiable piece of data will affect whatever 
processes attempt to use it, because of data dependence and 
arbitrary timing. 
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5.1.3. Implementation 

The implementation of Section 4.11 and Figure 4-7 remains 
unchanged. Refer now to Figure 5-1. The Tt-field shown there 
in the page register, whose contents designate the process whose 
working set contains the page, now is loaded at each reference 
with the name of the process making the reference. 

To be more precise, an information reference is a pair 
(i,p) where i is the name of the page being referenced and p is 
the name of the process making the reference. The only modifi- 
cation of Figure 4-7 is simply that p is loaded into the Tt-field 
of the page register as the reference is made. 

If T<T, the following difficulty arises. If the process 
named in the it-field enters page wait, we must be sure that 
another process does not borrow the page and then discard it 
before the it-field process completes its page wait. For example, 
suppose at time t process 1 enters page wait. If pro- 
cess 2 (which is not in page wait during the interval (t,t+T)) 
references a page in process l's working set just once in 
the interval (t,t+T-T), the page will exit process 2's working 
set before process 1 terminates page wait, and will not be 
available for use by process 1. 

There is no easy solution to this difficulty. One possib- 
ility is to choose t>T; but if T depends on the rotation time 
of a device, this may result in undesirably large values of T. 
Another possibility (shown in Figure 5-1 ) is to prevent a change 
in the contents of the it-field when the process named there is 
in page wait; but then other processes may obtain references 
without paying. 
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Figure 5-1. Implementation of shared memory management. 
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System 1 represents the completely unshared case, and is 
the poorest performance situation. System 2 represents the 
completely shared case, the best performance situation. 

In the following sections, we evaluate the quantities 
described in the following table. 

quantity symbol description 

expected working set size w(x,n) expected number of pages 

in the joint working set 
W(t,T,n) of n processes. 

missing-page probability \(x,n) probability that a pro- 
cess references a page 
not in the joint working 
set W ( t , T , n ) . 

paging rate p(t,n) number of pages per unit 

real time re-entering 
W(t,T,n) on behalf of 
one process. 

duty factor T)(x,n) fraction of time a run- 

ning or page wait pro- 
cess is running, when 
(n-1) others share the 
program Z with it. 



In each case we show that sharing is an improvement. That is. 
for n>l and T>0 we show that 

A.(T,n) < \(t,1) 

p ( t , n ) < p ( T , 1 ) 

n(x,n) > i"](t,1) 
and 

w ( t , n ) , , . , 

J < w( T, 1) 

This last relation differs slightly from the others for the 
following reason. There may exist a shared page that no one 
references often enough to keep continuously in main memory, 
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but that, together, the n processes reference often enough to 
keep it continuously in main memory. Thus, the joint working 
set will be larger than any single working set: w(T,n) > w(t,1). 
In the shared case, however, each process pays for - of the mem- 
ory used, so his expected cost depends on - — £ . By showing 

w(T i n ^ < W (T,1) we show that sharing reduces costs. 
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5.3. Shared Working Set Size and Memory Costs 

In Figure 5-2, let W(t,T,n) denote the joint working set 
of the n processes, and define 

(5.3.1) w(T,n) = expected size of W(t,T,n) 

When n=l, w(t,1) is exactly the expected working set size dis- 
cussed in Section 4.4. We shall obtain an expression for w(x,n) 
and show that the expected memory cost — — J - n -^- for one process 
is diminished for increased n. 

Theorem 5.1 . Let n statistically independent processes be sim- 
ultaneously sharing the same program Z, whose size is fixed 

at z. The interref erence distribution F (u) is the same 

x 

for each process, and is unchanging. Define the integral 

(5.3.2) I(T) = -i- f x (l-F (u)) du 

x ° x 

Then the expected size of the joint working set of the 
n processes is 

(5.3.3) w(T,n) = z [1 - (1 - I(T)) n ] 

Discussion : Note that, for n=l, we have 

w(T,l) = z [i-(i-Kt) )] = z I(T) = / T (l-F (u)) du 

* o x 

where we have used z=x from eq. 5.2.1 (and Theorem 4.1). Thus, 
the expression reduces to that of Theorem 4.4, in the unshared 
case. 
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Proof of Theorem 5.1 : We follow an argument similar to that of 

Theorem 4.4, in which we derived the expected working set size 

for one process. Let W (t,x) be the working set of process p.., 

and let u> (t,x) be its size. Define the binary random variable 
P J 

1 if page i in W (t,x) 

(5.3.4) y. = <(• P J 

1 otherwise 



Obs 



erve that y =1 if and onl Y if Pi referenced page i in (t-T,t), 



Then 



(5.3.5) u) p (t,T) = ^T y ± 



ieZ 



and 



(5.3.6) 



Kt) = to (t,T) = ^ Y ± = z Pr[Y i = l^ 



J isZ 

Now, from eqs . 4.4.8 and 4.4.9 we have 

Pr[Y-=0] = 1 - I(t) 

(5.3.7) 1 

Pr[Y ± = l] = Ki) 

where I(t) stands for 

(5.3.8) Kt) = -i-/£ U-p <u)> du = ^P- 

X x 

Now, let W(t,T,n) stand for the joint working set of the 
n processes, and u>(t,T,n) denote its size. Define the binary 
random variable 

{1 if page i in W(t,x,n) 

otherwise 

Observe that 6.=1 if and only if some one of the processes 

p , ...,p has referenced page i in the interval. (t-T,t). Then 
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(5.3.10) 



and 



(5.3.1i: 



to( t , T,n) - \ 



6. 
1 



lez 



w ( t , n ) ^ a)(t,T,n) 



2^ 

ieZ 



z Pr[5. =lj 



We must find Pr[& i =lj. Now, the n processes are statistically 
independent; then 



Pr[6 i = 0] .- Pr 



Pr 



no reference from any of n processes' 
to page i in the vt interval (t-T,t) 



no reference from one process to 
_page i in the vt interval (t-T,t) 



Pr[y. =0] 



(5.3.12) Pr[6.=0] - ( 1 - I ( x ) ) ] 



thus 



(5.3.13) Pr[6 i -lj = 1 - (1 - l(x)) n 
putting this into eq. 5.3.11, we obtain 

w(T,n) = z[l-(l-I(T)) n J 



QED. 



Define the expected memory usage of one of the n processes 
to be 
(5. 3.14) 



,(T,n) = w(T i n) 



so that m(T,n) measures the expected cost per unib virtual time 
attributed to one of the n processes. 
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Theorem 5.2 . Let m(T,n) = w(T ^ be the expected memory usage 
of one of the n processes, where w(T,n) is given by Theorem 
5.1, and depends only on the interreference distribution 



P (u). Then for T>0 and n>l, 
x 



(5.3.15) m(T,n) < m(T,l) 



Discussion : Theorem 5.2 asserts that sharing reduces costs. 
This result is very strong, for it depends only on the arbitrarily 
given distribution F (u). In other words, whenever two or more 
processes are sharing the same program Z: provided that Z re- 
mains fixed, that F (u) remains fixed, and that the processes 
are run concurrently, the shared expected memory usage costs 
are always less than the unshared memory usage costs. Put 
another way, sharing is always an improvement under the stated 
conditions, regardless of program behavior . 



The reader might think there are counterexamples. For example, 
let the n processes share an interlocked section. A process 
tests the interlock: if the interlock is ON, the process creates 
an enormous amount of data; if the interlock is OFF, the process 
turns it ON and works in the interlocked section. It is clear 
that n distinct copies require less memory than one shared copy, 
because in the shared case (n-1) processes will find the inter- 
lock ON, whereas in the unshared case no process finds the 
interlock ON. This violates the assumptions of the theorem, 
because the program size is not fixed in both cases, and because 
the interlock violates the assumption of statistical independence. 
Thus, this is not a counterexample. 
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Proof of Theorem S, ? : To prove m ( T, n ) < m( T, 1 ) we mus t show that 
(5.3.J6) id^2l <w(T5j) 

from Theorem 5.1, this in the same as showing 

.1 
(5.3.17) 



(1 - T(T)) n 



Kt) all t>0, n>l 



This requires that 



1 - (1 - I(T)) n 



(5.3.18) i _ > 

n I(t) 

or equivalent iy that 



(5.3.19) l - 



1 - ( 1 - I ( T ) ) n 



n( l-( 1-I(t) ) ) 
This expression is of the form 

n 

> A •- 1-I(t) < 1 



1 
(5.3.20) 1 



n ( ! -A ) 
Now, using the fact that 

, _, n 
(5.3.21) — — = i + fl + _ ,,n-l 



which f o I Iowa iroin A<1 , we ha 



ve 



(5.3.22) 1 - —1— > i _ n _ 

I -A n 

and the inequality is proved. 

QED. 

In figure 5-2, define K, to be the total expected memory 
requirement in system 1, and PL, to be the total expected memory 
requirement in system 2. We have the folJ owing rather obvious 
corollary to Theorem 5.2. 



Corollary 5.2 . M > M 



Proof : By Jheorem 
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M - n w('U,l) > w(x,n) - M, 



QED. 



Corollary 5.2 asserts simply that sharing reduces the overall 
memory usage, resulting in more memory for other programs to ue 
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5.4. Missing-Page Probability 

Define the missing-page probability to be: 

(5.4.1) \(t n) = Pr I a 9 iven process references a missing page,"! 
' (_when (n-1) others share the same program I 

Thus, \(T,n) is the probability that a process directs a refer- 
ence to a page not in the joint working set W(t,T,n). Using 
reasoning similar to that of Theorem 4.2, we can see that \(T,n) 
may be regarded as the number of pages per unit virtual time 
re-entering the joint working set W(t,T,n), on behalf of one of 
the n processes. That is, the expected virtual time interval 
between the page faults of one process is l/\(T,n). 

Theorem 5.3. Let \(f,n) be the missing-page probability as 
just defined, and let F x (u) be the interreference dis- 
tribution. Then 

(5.4.2) X(t,1) = 1 - F (x) 
and 

(5.4.3) \(T,n) = \(t,1)(1 - I(T)) n_1 
where 

(5.4.5) I(t) = -i- J x (i-p (u)) du 

- o x 



Proof.: From Theorem 3.3, the single-process, unshared missing- 
page probability is \(t) = 1-F x (t) = \(t,1). To find \(t,n) we 
note 
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?v(T,n) 



Pr 



Cgive 
at t 



en process, say p, references page 
ime t and finds it missing 



Pr< 



p's most recent reference interval x to 
the page in question satisfies x>T, and 
the (n-1) other processes made no refer - 
to the page in question during (t-x,t). 



Since the processes are statistically independent, this last 
probability becomes 

\(T,n) = ( Pr[x>T])(pr[no reference to page in (t-T,t)] 

= (1-P (T)) (l-I(T)) n_1 
x 



n-1 



X(T,1)(1-I(T)) 



n-1 



where the probability (l-Kt)) is obtained from the arguments 

given in Theorem 5.1. 

QED. 



The next theorem asserts that sharing reduces the missing- 
page probability. 

Theorem 5.4 . Under the given assumptions (F x (u) unchanging, 

processes running concurrently) the missing-page probability 
\(T,n) is reduced by sharing: 

(5.4.7) \(x,n) < \(T,1) if T>0, n>l 



n-1 



Proof : since (1-I(t)) < 1, if T>0 it follows that (1-I(t)) < 1, 

and thence \(x,n) - \( T, 1 ) ( 1-1 ( x) ) n-1 < 1. 

QED. 
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Recalling the discussion of Chapter 3, in which we showed 
that lower missing-paqe probabilities are equivalent to lower 
memory-usage costs, Theorem 5.4 verities that sharing reduces 
memory-usage costs. Indeed, in many circumstances it wi 1. 1 be 
true that 

\ ( t , n ) « A. ( t , 1 ) 



that as, sharing is a pronounced improvement. 



159 



5.5. Paging Rate 

Define the real time paging rate to be: 

, _ _ - , , ■, _ [~~ real time paging rate of one process when - ) 
" ' ~~ |_(n-l) others are sharing the same program 

That is, one process expects to see a real time interval of 
length l/p(T,n) between page waits. 

Theorem 5.5 . Let p(T,n) be the real-time paging rate as de- 
fined above. Then: 

X(T,n) 



(5.5.2) p(T,n) 



1 + \(T,n)T 

where A(x,n) is the missing-page probability, defined in 
Theorem 5.3, and T is the traverse time. 

Proof : In a virtual time interval of length V, one process 
generates V information references and expects to encounter 
V\(T,n) page waits. Therefore: 

(number of page waits) V\(T,n) 



p(T,n) 



(virtual time) + (page wait time) V + V\(T,n)T 

QED. 



If n=l we obtain p(T,l) = p(x), must like Theorem 4.3. 

Let us compare the total page traffic in the two cases, 
In Figure 5-2, let the total paging rates be denoted by 

¥.(t) = n p(T,l) 
(5.5.3) x 

Y 2 (t) = n p(x,n) 

We wish to show that sharing reduces page traffic. 
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Theorem 5.6 . Let ¥ (t) = np(T,l) be the unshared total page 

traffic, and let ¥„(t) = np(T,n) be the shared page traf- 
fic. Then 



(5.5.4) Y 2 (T) < ¥ (T) 



Proof : We must show p ( T , 1 ) -p ( T, n) > if n>l. Consider 

\(t,1) \(T,n) 



p( t, 1) - p( T,n) 



1 + X(T,1)T 1 + \(T,n)T 

\(T,1) - \(T,n) 
(1 + \ ( T , 1 ) T ) ( 1 + \(T,n)T) 

> 
where we have used X( 1 , 1 ) -\( t , n) > from Theorem 5.4. 

QED. 
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5.6. Duty Factor 

Define the duty factor to be 



(5.6.1) ti(t n) = f~ dut y factor of one process when it is - ] 
' |_sharing its program with (n-1) others J 

Recall that r)(x,n) is the fraction of time a process is in the 
running state as opposed to the page wait state; thus, T)(T,n) 
measures the ability of a process to use a processor. 



Theorem 5.7. Let ii(T,n) be the duty factor, as defined above. 

Then 

1 
(5.6.2) r)(x,n) = 

1 + \(T,n)T 
where X(T,n) is the missing-page probability (Theorem 5.3), 
and T is the traverse time. 

Proof: In a virtual time interval of length V, the process 
encounters VX(T,n) page waits. Then 

(virtual time) 



r}(x,n) 



(virtual time) + (page wait time) 
V 



V + V\(T,n)T 

QED. 

If n=l, we have t)(t,1) = t)(t), just like Theorem 4.6. 

Theorem 5.8 . Sharing increases the duty factor: 
(5.6.3) T)(T,n) > tj(t,1) if n>lj T>0 
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Proof : Since X(T,i) > X(x,n) we must have 
1 + >, ( x , 1 ) T "> 1 -+ X ( x , n ) T 

hence 

1 1 

1 + X ( x , 1 > T 1 + X ( x , n ) T 

QEn. 

Again, under many circumstances X(x,l) » \(x,n) and it is 
not difficult to obtain T](x,n)~l, even for small n. Thus, 
sharing can result in markedly increased processing efficiency. 
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5.7. Variable Number of Participants 

It is often the case that the number of participants in 
a sharing problem is not fixed; instead, the number is a random 
variable. The convexity theorem (Theorem 3.1) enables us to 
obtain bounds on w(x,n), m(T,n), \(T,n), p(T,n), and T](T,n) 
when the average value n of n is known but the distribution of 
n is not. These bounds are summarized in the following table. 



quantity symbol 

expected working w(T,n) 
set size 



one-process 
memory demand 
( pages ) 

missing-page 
probability 



convexity (in n) 
convex 



m(x,n) convex 



\(x,n) concave 



paging rate p(x,n) concave 
duty factor r|(T,n) convex 



bound 



w(T,n) ,< w( T,n) 



m(T,n) < m(x,n) 



\(T,n) > \(x,n) 



p(T,n) > p(T,n) 



T](T,n) < T)(T,n) 



In the most general n-process sharing problem, information 
can be in use by any combination of processes, and each possible 
combination will be sharing different subsets of information. 
Suppose Z is the program in use by processes p_,...,p . We can 
partition Z into as many as 2 -1 blocks, such that exactly some 
subset of p, ,...,p is using each block. Each block associated 
with just one process behaves as system 1 in Figure 5-2. Each 
block associated with more than one process behaves as system 2 
in Figure 5-2, having higher per-process efficiency, lower per- 
process memory-usage costs, and lower paging rates. Therefore 
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the net effect across the program Z is better than the situation 
when p , ...,p share nothing at all. Thus, system 1 represents 
the worst case behavior and system 2 represents the best case 
behavior; any actual system would fall in between these two 
extremes . 
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5.8. Summary 

When working sets are defined so that a page belongs only 
to one working set at a time, namely the one of the process that 
most recently referenced it, memory usage costs tend to be dis- 
tributed among participants in accordance with their degrees of 
participation. Implementation is straightforward. 

Using a simple model with complete sharing we were able to 
obtain strong results that quantitatively verify intuitive ideas: 
providing processes are run concurrently and the interref erence 
distribution is unchanging, sharing always improves performance, 
regardless of the particular interref erence distribution. In 
many situations the improvements can be very pronounced. 

Processes sharing information must be run concurrently 
(requiring multiple processors) whenever they are not blocked 
because 

1. If run at widely separated intervals, the same infor- 
mation must (unnecessarily) be reloaded. 

2. It is only when references are arriving concurrently 
to shared information that the benefits obtain. 

The results obtained here apply to a collection of n statis- 
tically independent processes, without regard to whether they 
are components of multiprocess computations or single-process 
computations. Thus, it should be clear that multiprocess com- 
putations, by permitting interprocess information sharing, can 
be very efficient, provided that processor-switching time is 
small and there are enough processors to permit the parallel 
operation of many processes. 
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CHAPTER 6 



Demands and Balance 



o.O. Introduction 

We regard a computation, a collection of mutually cooper- 
ating processes and information operating within a common name 
space, as being the fundamental demand-making entity in a com- 
puter system. A computation manifests itself by demanding the 
joint use of processor and memory resources. 

Because we want a computation to operate effectively as a 
unit, we believe it is necessary to allocate resources to a com- 
putation as a unit. We therefore assume that the entities being 
scheduled for service are computations. This is a generalization 
of existing scheduling philosophies, which call for scheduling 
of processes. 

Let C be a computation, with working set W (t,"t). The 
working set size a> c (t,T) will be used to define C's memory demand . 

If, on the one hand, C is a single-process computation, its 
expected running time beyond the present will be used to define 
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its processor demand . If, on the other hand, C is a multiprocess 
computation, the number of active component processes will be 
used to define its processor demand. We make this distinction 
because in multiprocess computations the number, rather than the 
duration, of processes is important, whereas in single-process 
computations the duration of the process is important. 

Each computation will be assigned a sys tern demand consisting 
jointly of its processor and memory demands. Computations re- 
quiring the use of system resources will be segregated: those 
in the standby set temporarily receive no service, whereas those 
in the balance set receive service. The system is balanced when 
the total demand of the balance set matches the available equip- 
ment . A balance policy is a resource allocation policy that 
regulates membership in the balance set so that the system re- 
mains balanced. 

We shall study all these concepts in more detail, then 
examine general properties of balance policies, and conclude 
the chapter with a survey of the pertinent literature. 



Recall that the N processors and M pages of main memory cons- 
titute the equipment . Because we may wish to hold some equip- 
ment in reserve, we assume that constants a and S have been given 
(we shall discuss how in Chapter 8), where 0<pc<l and 0<S<1, and 
we will say that ocN processors and BM main memory pages cons- 
titute the available equipment . 
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6.1. Memory Demand 

We have defined a working set memory management policy to 
be one that permits a process to run if and only if there is 
enough uncommitted space in main memory to accommodate its 
working set. Working set pages fill these uncommitted slots on 
demand. Thus, the working set size co(t,T) is a useful measure 
of memory demand. 

Let C be a computation consisting of processes p,,...,p n . 

If W (t,T) is the working set of process p., then the working 
Pj 

set of C is 

n 

(6.1.1) W c (t,x) = [J W (t,x) 

j = l j 

If we use the working set definition given in Section 5.1 

(a page is in the working set of whatever process most recently 

referenced it), working sets will be disjoint, and their sizes 

add: 

n 

(6.1.2) (o c (t,T) = V U) (t,T) 



We define the memory demand ni-tt) of computation C at time t 



to be : 



u) r ( t , T ) 



in [^ M J 



(6.1.3) m c (t) = min I 1, -^ J < m c (t) < 1 



where M is the number of pages comprising main memory, and 
o) r (t,T) is the size of C's working set. 

Clearly, m c (t) represents the fraction of the memory re- 
source demanded by C at time t. If C's working set W c (t,T) 
contains more than M pages (it exceeds memory) we regard its 
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memory demand as being m (t)=l because it is demanding the entire 
resource. Presumably M is large enough so that the probability 
Pr[m_(t)=l] (over the ensemble of all computations) is very 
small . 

The definition of memory demand applies to any computation, 
whether it be single-process or multiprocess. 
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6.2. Processor Demand 

We assume that the processor demand of a multiprocess com- 
putation depends on the number of processes, whereas the proces- 
sor demand of a single-process computation depands on the dur- 
ation of the process. 

In contemporary computer systems, the page wait time T 
depends mostly on the rotation time of a device. Because the 
switching time of a processor is relatively much smaller than T, 
it is worthwhile to switch a processor to a second process during 
a page wait of the first. 

To be more precise, let S represent the time required to 
switch a processor from one process to another. S is actually 
the expectation of a random variable composed of electronic 
switching times and scheduling delays. If T>S , it is not eco- 
nomical to dedicate a processor to a process during a page wait, 
whereas, if T<S , it is economical to do so. 

Define the binary random variable %(t) for a given process 
at time t to be: 

1 if a processor is assigned to the pro- 
(6.2.1) Tt(t) = <> cess at time t 

otherwise 

This quantity Tt(t) is related to the processor demand of a pro- 
cess. The relationships among process states, memory demand, 
tt; ( t ) , the traverse time T, and the processor switching time S, 
are summarized in the following table. 
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process state memory demand processor demand 

blocked m(t) = Tc(t) = 

ready m(t)>0 n;(t) = 

running m(t) > 7t(t) = 1 

page wait ra(t) > ° * (u) = 1 if T<S , 

during (t,t+T) a ^ , _ l , N ° r . We(t,t+T) 

y ' m(t+T) = m(t)+^ 7t(u) = if T>S ' 



We have assumed that the entities to be scheduled (here- 
after called jobs ) are computations — specific sets of processes 
rather than individual processes. Thus, we assume that all a 
computation's non-blocked processes are running (or page wait), 
or else all such processes are ready. We define the states of 
a, computation to be: 

1. enabled: all non-blocked processes are running or 
page wait. 

2. standby : all non-blocked processes are ready. 

3. disabled : all processes are blocked. 

In our work here, only these states are permitted. 

Correspondingly, we define the working set of processes 
P(C,t) of a computation C at time t to be: 

/non-blocked processes! if C enabled 
(6.2.2) P(C t) = ^ C at time t J or standby 

if C disabled 

where is the empty set. Note that P(C,t) is well-defined 
even if C is a single-process computation. 

Using these ideas, we shall define processor demand in 
both the single-process and multiprocess computation cases. 
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6.2.1. Multiprocess Computations 

In this case, the processor demand is concerned with the 
number of processes in a computation, because the computer sys- 
tem must know how many processing units to assign. 

Let C be a multiprocess computation. We define C's 
processor demand p_(t) at time t to be: 

(6.2.3) p c (t) = min (l, ' p( ^t)l \ 0<p c (t)<l 

where N is the number of processors, and P(C,t) is the working 
set of processes in C (eq. 6.2.2). 

It is clear that p~(t) represents the fraction of the 
processor resources needed by C at time t. Presumably N is 
large enough so that the probability Pr[p (t)=l] (over the 
ensemble of all computations) is very small. 

Note the symmetry between the definitions of processor 
and memory demand (eqs. 6.1.3 and 6.2.3), in the case of multi- 
process computations. 



6.2.2. Single-process Computations 

In the case of single-process computations, P(C,t) con- 
tains at most one process; so we must be concerned with its 
duration in order to know how long to assign a processor to it. 
Thus, computer systems in which single-process computations pre- 
dominate (systems such as Multics or IBM System 360) must use 
a somewhat different definition of processor demand. Because 
of this, we are unable to completely preserve the symmetry 
between the definitions of processor and memory demand. 
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We should like to define processor demand in the single- 
process case so that a processor demand has a meaning in the 
time domain analogous to the meaning of a memory demand in the 
space domain. A useful method (by no means the only one) is 
described in the following paragraphs. 

Let the random variable g denote the virtual time interval 
between interactions. It has been found [C4,F4] that the prob- 
ability density function for g, f (u), may be modelled by a 
hyperexponential distribution: 

(6.2.4) f (u) = cae~ au + (l-c)be" bu ° < a < ^ 

g < c < 1 

f (u) is diagrammed in Figure 6-1; most of the probability is 
concentrated toward small g (i.e., freguently interacting pro- 
cesses), but f (u) has a long exponential tail. 

Given that it has been y v "tu since the last interaction, 
the conditional density function for the time beyond y until 
the next interaction is 

f (u+y) 

(6.2.5) f (u) = 3 u > 

q|Y /• f (v) dv 

y q 

which is just that portion of f (u) for g>y with its area nor- 
malized to unity. The conditional expectation function Q(y) is 

(6.2.6) Q(y) -- J w u f (u) du 

' J o g|y 



£e-ay + 1^-by 
a b 



" a Y + n-r-)^~ h y 



+ (l-c)e 

Q(y) is the expected time beyond y until the next interaction; 
it is illustrated in Figure 6-2. It starts at 

(6.2.7) Q(0) = - + ^^ 

a b 
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f ( u 




.qure 6-1, Probability density function f ( u ) 



U ( v ; 



Q(«) 



QlO) 




^» y 



Figure 6-2. Conditional expectation function Q(y) 



175 



and rises toward a constant maximum 

(6.2.8) (1 v = Q(oo) = -j 
max a 

Note that, for large y, the conditional expectation Q(y) becomes 
independent of y. 

The conditional expectation function Q(y) is a useful pre- 
diction function — if a process has consumed y vtu since its 
last interaction, we may expect it to consume an additional 
Q(y) vtu before its next interaction. 

A reasonable choice of quantum to allocate a process might 
be kQ(y) for some suitable constant k>l . It should be clear 
that Q(y) can be measured and updated from time to time by the 
computer system. 

We should point out that this notion — a conditional 
expectation function to predict processor usage — is very use- 
ful and quite independent of the hyperexponential distribution 
hypothesis . We have formulated it in terms of the hyperexponen- 
tial because the hyperexponential is a good model and because 
the hyperexponential has the interesting property that the pre- 
diction function Q(y) becomes independent of y for large y. 

Just as we are unwilling to commit more than M pages of 
memory, so we may be unwilling to commit processor time for more 
than a standard interval A into the future. This interval A 
can be chosen to reflect the maximum tolerable response time to 
a user: for if the set of processes receiving service has total 



That is, other prediction functions might be used. The CTSS 
scheduler [C6,S3], for example, happens to use the prediction 
function Q(y)=y. 
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expected time consumption not exceeding A, then no process in 
this set expects to wait longer than A before its own interaction. 
Just as M is a space constraint, so A is a time constraint. 

We define the processor demand P c (t) of a single-process 
computation C at time t to be: 

(6.2.9) p (t) = S_ < p (t ) < _ffiax 

N A NA C ~ NA 

where y c is the time used by C's process since its last inter- 
action. 

Since N processors have NA units of time to be committed 
among them, P c (t) is the fraction of this total that C is 
expected to need before its next interaction. Note that this 
definition of processor demand is just the previous definition 
(eq. 6.2.3), with |p(C,t)| = 1, multiplied by the expected 
duration of processor use. It is no longer symmetric with the 
definition of memory demand (eq. 6.1.3). 
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6.3. System Demand 

We define the system demand d c (t) of a computation C at 
time t to be a pair 

(6.3.1) ^c (t) = (p c (t),m c (t)) 

where P c (t) and n> c (t) are the processor and memory demands of C. 

That the processor demand is p c (t) tells us to expect C's 
immediate processor need to be Np„(t) processors . That the 
memory demand is nuCt) tells us to expect C's immediate memory 
need to be MnuCt) pages. 

This definition applies to C being either a multiprocess 
or a single-process computation. It expresses the dual mani- 
festation of C, as a demand for processors and as a demand for 
memory. ^d-Ct) must be considered as a two-dimensional random 
variable, with unknown correlations between p r (t) and m_(t). 



If C is a single-process computation, then P c (t) tells us to 
expect C to require one processor for NAp r (t7 vtu. 
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6.4. System Balance 

Let numbers a and B be given, where 0<pc<l and 0<B<1, and 
let D,(t) represent the total demand presented by enabled com- 
putations : 

(6.4.1) D(t) = V d_(t) B = /enabled I 

£_j C L computations J 

CeB 

The computer system is said to be balanced at time t if 

(6.4.2) £(0 = (<x,B) 

The system is pr oces s or -balanced if 

(6.4.3) V p c (t) = a 
CeB 

The system is memory-balanced if 

(6.4.4) V m c (t) = B 
CeB 

That the system is balanced means that the total resource re- 
quirement of enabled computations is simultaneously for ctN pro- 
cessors and for BM main memory pages. 

The resource allocation problem is to decide dynamically 
which computations to enable so that balance is maintained. 
This set of enabled computations at time t will be called the 
balance set B. In general, the system will not be balanced at 
each instant of time; instead there will be a sequence of instants, 
called the decision points , at which the demand of B is made to 
return to the desired demand (oc,B) by admitting or removing 
computations from the balance set B. 
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One of the chief advantages of balance is simply that the 
balance set B presents (at least at decisions points) a known 
demand; that is, 

P R (t) = a 1 
k6 4 5 ) B > t a decision point 

m B (t) = B j 

A major design problem, one we shall discuss in Chapter 8, 
is that of determining the balance parameters a and B. These 
parameters will be chosen so that, just before a decision point t, 
the probabilities 



Pr 



[I p c (t - 6) *o 



'CeB 
(6.4.6) 6 > ° 



[E m c (t - 6) ^] 



Pr 

"CeB 



are as small as desired. 

In Figure 6-3 we have diagrammed the flow of jobs (i.e., 
computations) among the states enabled, standby, and disabled. 
New jobs enter the standby set . The scheduler regulates member- 
ship in the balance set so that balance is maintained. If a 
computation becomes disabled, it enters the disabled set. These 
points should be noted: 

1. Each job in the standby set has its demand associated 
with it. When a new job enters the standby set, an 
estimate of demand must be associated with it. In the 
absence of reliable predictive information, the best 
estimate is (p,ni), where p is the average processor de- 
mand over all computations, and m is the average memory 
demand over all computations. 
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Figure 6-3. Job flow in balanced computer system. 
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2. In general, when the scheduler admits a new job to the 
balance set, it allocates a quantum q to the job, where 
q represents the total virtual-time processor consump- 
tion permitted to the job's processes. If q expires 
(at time t), the job is returned to the standby set, 
with its current demand (p(t),m(t)). 

3. The balance set contains a mixture of running and page 
wait processes, together with their working sets. 
Memory management follows a working set strategy. 

In the case of single-process computations, the following 
terminology (from Multics) is often used. Any process in the 
standby set is said to be ready , and so the standby set may be 
called the ready list . Any process in the disabled is blocked , 
and so the standby set may be called the blocked list . Any 
process in the balance set is either running or page-wait , and 
so the balance set may be called the running list . There is, 
however, a very important difference with Multics: here, a page- 
wait process remains a member of the balance set; in Multics, 
a page-wait process is regarded as being blocked. 
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6.5. Balance Policies 

A balance policy is a resource allocation policy that keeps 
the computer system balanced. It is implemented by the scheduler 
shown in Figure 6-3, which regulates membership in the balance 
set. Expressed as a minimization problem, a balance policy is: 

(6.5.1) {minimize |jD( t)-(ct,B) | } 

where |_D( t)-(oc, B) | stands for componentwise minimization. 



6.5.1. Demand and Usage Spaces 

To help visualize the operation of a balance policy, it is 
useful to define two spaces: the demand space V and the usage 
space U. We regard V and U as being two-dimensional: a typical 
point (p,m) in either space is capable of representing the demand 
of some computation. The demand space V contains a set of 
specially designated points, the demand points , one representing 
the demand ^.(t) of each enabled computation C in the balance set. 
The demand points are time-varying in position. The usage space 
JJ contains two specially designated points: the actual demand 
point D,(t) and the desired demand point (oc,B). A balance policy 
tries to move the actual demand point, along some path, closer 
(in the sense of eq. 6.5.1) to the desired demand point. These 
ideas are illustrated in Figure 6-4. 

Unfortunately we must be careful not to interpret the spaces 
V and U as metric spaces, because the path ,D(t) follows when it 
moves toward (oc,£) affects system behavior. If these spaces were 
metric spaces we would be able to assign a magnitude, say (i(d,) , 
to a demand d, which would in turn imply that system performance 
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would depend only on the magnitude of the imbalance, (i(_D( t ) -(a, B) ) 

The following argument shows that this is not the case. 

Figure 6-5 shows how system performance will depend on the 

path. The paths shown are: 

1. path 1. (first balance memory, then processor.) First 

examine the standby set for a subset of computations 

with memory demand 6^ ; second, select from this subset 

a computation with processor demand 6 . In the first 

P 

step the whole standby set is examined, so it is highly 

probable a computation with memory demand & will be 

m 

found. In the second step only a subset of computations 

(those with memory demand & ) is examined, so it is 

m ' 

less likely a computation with processor demand & 

P 

will be found. The result is that memory usage is 
tightly distributed about J3M, whereas processor usage 
is loosely distributed about ocN. 

2. path 2. (first balance processor, then memory.) This 
has exactly the opposite effect as path 1, the memory 
usage being loosely distributed about BM, the processor 
usage being tightly distributed about ocN. 

3. path 3. Balance both processor and memory simultaneouly 
by examining the standby set for a computation whose 
demand is exactly (6 ,6 ). This has an effect inter- 
mediate between those of path 1 and path 2, the pro- 
and memory usage tending to be equally distributed 
about the desired points. 

Now in itself the path effect need not interfere with performance. 
But in computer systems in which the traverse time is large and 
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computations are single-process, it is extremely important to 
balance memory properly in order to avoid thrashing, and to 
avoid accumulating too many traverse times from returning pages, 
In such systems path 1 is the best path. In such systems we can 
tolerate higher imbalance in processor usage, because (see 
eq. 6.2.8) the standard interval A is a delay or response con- 
straint and is therefore a value judgment, whereas the memory 
size M is a physical constraint. 

Conversely, in computer systems in which the traverse time 
is not large and computations are multiprocess, path 3 is the 
best path because we are equally concerned with processor and 
memory balance. 



6.5.2. Properties of a Balance Policy 

Although we shall defer detailed discussion about implemen- 
ting balance policies until the next chapter, it is nonetheless 
useful to point out certain properties the implementation should 
or will have, consistent with the objectives of a multiprocess 
computer system. 

First, the balance criterion is not necessarily an equipment 
utilization criterion. If ( a ,B) are set close to (1,1) then 
certainly equipment is fully utilized. If ( a ,B) are set much 
less than (1,1), the service to users is improved because, as 
we discussed in Chapter 1, there is an inverse tradeoff between 
utilization and service. Therefore a and £ can be regulated by 
the administration to meet its current objectives. 
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Second, balance significantly diminishes the possibility 
of thrashing because, by proper selection of the balance para- 
meters a and B, the probability that the system actually enters 
an overload condition can be made arbitrarily small. 

Third, we shall see in Chapter 7 that an implementation of 
a balance policy can be made to have the property that the rel- 
ative computational overhead required to restore balance depends 
only on the degree of imbalance (i.e., (&.& ) ^ n Figure 6-5) 
and not on the size of the total demand D,(t). This guarantees 
that balance is configuration independent in the sense that 
the same basic strategy scales over a very wide range of loads. 

Fourth, the reader will recall that we have assumed fairness 
should be built in to a balance policy. We will say that a policy 
is fair if a job's waiting time depends only on its order of 
arrival relative to jobs of comparable demand, and not on its 
order of arrival relative to jobs of different demand. Many 
existing scheduling philosophies, which tend to stall long jobs 
in deference to shorter jobs, are not fair by this definition. 
In the next chapter we shall show how to incorporate fairness 
into a balance policy. 

Fifth, balance makes it possible to make jobs independent 
of one another, in the sense that an increase in the demand of 
one will not interfere with the resources in use by another. 
If the balance parameters a and B are set less than unity, then 
there will be a slack of (l-oc)N processors and (l-B)M memory 
pages to absorb just such demand fluctuations. This paves the 
way to tractable analysis. 

Sixth, a balance policy should tend to run two processes 
concurrently whenever they are sharing information, in order to 
reap the benefits of sharing. 
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6.6. Survey of the Literature 

Much of the literature is devoted exclusively to problems 
of processor scheduling or to memory allocation : little of it 
is devoted to a unified treatment of both. 

By far the greatest part of the literature addresses itself 
to scheduling; a myriad of algorithms and analyses have appeared. 
Estrin and Kleinrock [El] and then Coffman and Kleinrock [C3] 
have very good surveys of the important algorithms. In refer- 
ence [C3] it is demonstrated that all the algorithms are sus- 
ceptible to countermeasures : because most algorithms favor small 
tasks (both in size and duration) either explicitly or implicitly, 
a user may significantly improve service to himself by subdivid- 
ing his job into a sequence of small tasks (provided no one else 
does this too). This does overall efficiency no good. 

A variety of papers report on memory allocation [B1,B2,D2, 
D4,P4,R2]; we have already discussed these in Chapter 3. 

Not much reported work deals with interactions between 
processor and memory. The approaches most often used are either 
to regard scheduling as the primary allocation function, memory 
management as the secondary allocation function, or else to re- 
gard them both as being independent. It should be obvious by 
now that in existing systems the problem of memory management 
is of far greater importance than that of scheduling, on account 
of the very large traverse time and the serious possibility of 
thrashing. Therefore, if memory is properly managed, almost 
any reasonable scheduling algorithm will function well. Con- 
versely, if memory is mismanaged or overloaded, the particular 
scheduling algorithm will be of little consequence. 
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No work is reported that gives insight into how scheduling 
problems are compounded by information sharing. For example, 
it is clear that scheduling algorithms should tend to run two 
processes together in time whenever they are sharing information. 
However no study reports on the extent to which a scheduling 
algorithm should tend to do this, or whether it should tend to 
do this explicitly at all. 

There has been some work on system balance. It falls into 
two classes: static balance , the problem of determining an op- 
timum equipment configuration for a given program mix; and dyn- 
amic balance , the problem of dynamically adjusting the load to 
the existing equipment. Nielsen [N1,N2] has reported on simul- 
ation work for static balance which has been of considerable help 
in configuring the Stanford version of IBM System 360. Saltzer 
[S2] describes some rule-of-thumb performance measurements that 
may be used to test a system to decide whether or not it is 
statically balanced or whether it is thrashing. 

The most interesting work of all concerns dynamic balance. 
Oppenheimer and Weizer [02] report that their simulation of the 
RCA Spectra 70/46 Time Sharing Operating System verify conclus- 
ively that even relatively primitive notions of dynamic memory 
balance result in markedly improved performance. O'Neill, 
Belady, and colleagues [01] have been experimenting with a 
load-leveler on the M44/44X computer, and have been very pleased 
with the results. 
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6. 7. Summary 

The important concepts introduced here in this chapter all 
center around the idea of supply-and-demand allocation in large 
computer systems. Memory demand is based on working set size. 
Processor demand is based on the intensity or the duration of 
processor requirements. System demand is a composite of these 
two types of demand. 

Computations requiring resources are divided into two classes: 
standby set computations, which are temporarily denied use of 
system resources; and balance set computations, which are granted 
tne use of system resources. The system is balanced just when 
the total demand of the balance set matches the available equipment. 

A balance policy is a resource allocation policy that reg- 
ulates membership in the balance set so that system balance is 
maintained. The demand space and usage space were introduced as 
conceptual aids to understanding properties of balance policies. 
An important property, the path effect, is the dependence of per- 
formance on the order in which the processor and memory resources 
are balanced. 

We distinguished two aspects of balance. The first aspect, 
static balance (controlled by the administration) is the problem 
of matching the equipment configuration to the total demand of 
the user community. We return to this in Chapter 8. The second 
aspect, dynamic balance (controlled by the scheduler) is the 
problem of matching the demand of the balance set to the existing 
equipment. We direct attention to this in the next chapter. 
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CHAPTER 7 



Implementation of Balance Policies 



7.0. Introduction 

The general structure and basic properties of a balance 
policy have been given in Chapter 6. It remains to show how a 
balance policy can be realized. 

The three most important things we are requiring from a 
balance policy are: first, that it keep the system balanced; 
second, that it be fair; and third, that it assure reasonable 
policies with respect to other criteria such as minimum response 
time. 

We distinguish two cases: the one-dimensional case is ap- 
plicable to contemporary computer systems, in which the threat 
of thrashing makes memory balance so much more important than 
processor balance; the two-dimensional case is applicable to 
future computer systems, in which both processor and memory 



192 



balance will be equally important. In one-dimensional cases, 
we explicitly balance only one resource type and try to achieve 
reasonable balance of the other resource type, whereas in two- 
dimensional cases we explicitly balance both resource types 
simultaneous ly . 

The most important result of this chapter is: we formulate 
mathematical programming problems whose solutions, found dy- 
namically by the scheduler, are almost-optimum balance policies. 

In Section 7.1 we present an analysis of a single-server, 
first-come, first-served queue, because this can act as a worst 
case analysis for the behavior of the queue structures we pro- 
pose. In Section 7.2 we study properties of queue structures 
that guarantee fair policies, and we find bounds on the processor 
and memory requirements needed to act as servers to the queues. 
In Section 7.3 we formulate the one-dimensional mathematical 
programming problem, and give a simple algorithm that finds the 
optimum solution in the particular case of memory balance. In 
Section 7.4 we formulate the two-dimensional mathematical pro- 
gramming problem, but we do not attempt to give solutions. 
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7.1. Analysis of a Single-Server Queue 

The statistics of a first-come, first-served (FCPS) single 
server queue can be used to obtain the worst case behavior of a 
balanced computer system. 

The queueing system under consideration is shown in Figure 

7-1. Job interarrival times are exponentially distributed with 

mean — . That is, if {t Jis the sequence of instants at which 

jobs arrive, the interarrival times y = t -i , are identically 

■n n n— 1 -" 

distributed according the the density function 

(7.1.1) f (u) = f (u) = ae~ au u > 
Y n Y 

Similarly, the job service times are exponentially distributed, 
with mean £•• The rate (a) of job arrivals remains fixed, re- 
gardless of the number of jobs in the system; in other words, 
we regard the source population as being infinite. 

Our use of exponential interarrival and service times, and 
an infinite source population, requires justification. 

We are directing the analysis toward large systems, in which 
a large source population generates the service requests. In 
these systems, exponential interarrival and service distributions 
are good models for at least two reasons. First, it is well 
known that, when a large population generates service requests, 
the times between arrivals from the population tend to be ex- 
ponentially distributed, even though the times between arrivals 
from a particular member of the population do not. The tele- 
phone system, for which it has been found that the interarrival 
and service distributions are very nearly exponential [P3,p.28l], 
is an excellent example of this behavior. Second, there is 
considerable evidence to indicate that many interarrival and 
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service distributions are approximately hyperexponential in the 
case of not too large populations [C4,P4] ; these distributions 
have exponential tails. By assuming exponential arrival and ser- 
vice distributions, we are modelling the tails of the actual dis- 
tributions, thereby providing a worst case analysis. 

Furthermore, the exponential case, interesting in its own 
right, can yield insights perhaps not obtainable from protracted 
analysis . 

When the source population is finite and the server is sat- 
urated, jobs pile up in the queue and, there being fewer reques- 
tors remaining in the source population, the arrival rate slack- 
ens. Because we are interested in the unsaturated behavior 
of a computer system our use of infinite source populations 
(in which the arrival rate is independent of queue length) is 
not unreasonable. Assuming that balance policies keep the com- 
puter system out of saturation, a job will not be seriously de- 
layed in the queues, and the source population will not be ser- 
iously depleted. Thus, the arrival rate will not significantly 
slacken. Indeed, experience has shown that infinite population 
models approximate finite population systems with surprisingly 
little error, for population sizes as small as 20 [F2, Vol.1, 
p.l43ff]. 

Nevertheless, since we are in fact making approximations 
to actual behavior by using these assumptions, we can only 
interpret the results as being system averages . 

When (n-1) jobs are in the queue and 1 job is in service, 
the system is in state n. Let n denote the steady state prob- 
ability of state n. 
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Theorem 7. 1 . In the queueing system described above, the 
steady state probability of state n is 

(7.1.2) K = r n (l-r) n - 0,1,2,... 

where r = — . The mean and variance of n are 
b 

- _ r 
1-r 
(7.1.3) 

a 



( 1-r) 



Proof : In a small time interval dt, the probability of a tran- 
sition from state (n-1) to n is (a dt); in the same interval dt, 
the probability of a transition from n to (n-1) is (b dt). 
Therefore in the steady state we must have 

Tt , a dt = ti b dt 
n-1 n 

which means 

a 
ti — ti . 
n b n-1 

Letting r = r-, we have 

n a 

ti = r ti r - — 

no b 

The generating function for n is 

oo 



n 1 

TI 2 ~ 

n 
n-0 



G(z) = > Tt z ti r < 1 

/ i n o 1-rz 



Since G(l) - 1, we have ti ^ 1-r and thus 
' o 

n, n i 

ti r ( 1-r ) 

n 

which verifies eq. 7.1.2. The mean and variance of n are given 
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by the usual expressions : 
n = G'(l) 



1-r 



a 2 - G" (1) + G' (1) - n 2 



n ~ M ,2 

( 1-r J 



which verify eqs . 7.1.3, 



QED. 



Theorem 7. 2 . Let the random variable y denote the time a j ob 
waits in the queue described above until it enters ser- 
vice. Then the probability density function f (u) is 

Y 

given by 

r — u=o 

( 7.1.4) f (u) = <^ 

a,, > -(b-a)u . n 
I — (b-a)e u>0 

v b 

and the mean and variance of y are given by 

a 1 



Y 



b b-a 



7.1.5) , 

a 2 a 1 



Y twv. ,2 
1 (b-a) 



Proof : Observe that 

, if n=0 Pr[y-0J = n = 1-r 

(7.1.6) y = J " 

/ s . otherwise 

i = l 
Here, s. is the random variable of service time for one job, 
whose density function satisfies 

f s (u) = be" bu u>0 

i 
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for each subscript i. Eq. 7.1.6 is obtained by noting: if the 
job arrives to find n already in the system it must wait for all 
n to complete service. 

Suppose n>l. We want to find f (u), the density function 
for y when n are in the system. Observe that 

f n ' u ^ du = Pr!. (n-1) events in u and 1 event in (u,u+du) 

( bu ) -bu . , n . 
= (n-D! e vb du) 



so that 



, /, n , (bu) -bu 

1 Ui ; ~ b — -, ttt e 

V,n (n-D! 



To find f , ( u) : 
y,D>l 



00 °° n-1 

f r _(u) = y± (u) :c = Vb^U 
Y, n>x ^_, y,n n ,/_, (n-D! 



n- 1 v, 

-bu n , . 
e r (1-r) 



n=l n-1 



x-- \ i / t \ - b ( 1-r ) u 
f , ,iu r b ( 1 -r ) e 
Y , n>.l 



-p f,,i r -D \ -(b-a)u 
f . , ( u ) = — ( o-a ) e 
y,n>l b 



nee r = — . Finally, 



f n vu) !i :- — — at u = 
y , o b 



Dropping the use of the second subscript, 

b-a 



b 
f ( u ) 



u = 



' D-u -, -(b-a) u ^ . 
r-(b-a)e u>0 



which verifies eq. 7.1.4. 



199 



The mean and variance of waitinn time are 



/• u f (u) du = £ -i- 
c y b b-a 



< = y 2 -y 2 a x 



Y b (b-a) 2 



which verify eqs . 7.1.b. 



QED. 



Theorems 7.1 and 7.2 can be used to find bounds on the 
number in the system and on the waiting time. A bound on the 
number n in the system can be obtained from 



(7.1.7) Pr[n>ul \ zi r u + 1 

n-u + 1 
and a bound on the waiting time y from 

(7.1.8) Pr[y>uJ = / °° f (v) dv = ^"^-aju 

J u y b 
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7.2. Organization of the Queues 

The systems of queues described below are embedded in the 
standby set. They are organized so that the scheduler can quickly 
locate jobs of whatever demand it seeks. They are specifically 
intended for use in contemporary computer systems, in which the 
grave danger of thrashing makes it so overridingly important to 
balance memory. They are the queues for use in the one- 
dimensional case. 

The following discussion illustrates how fairness can be 
incorporated into a balance policy. It also establishes bounds 
on the total processor and memory requirements needed to accom- 
modate the balance set. 



7.2.1. An Almost-Continuous System of Queues 

Figure 7-2 illustrates a very general, one-dimensional 
queueing structure. We assume that jobs (i.e., computations) 
are arriving at random, interarrival times exponential with 
mean -. Job ( working set ) sizes are integers s, se[l,s Q ], s Q 

cL 

being the size of the largest working set. The job size dis- 
tribtuion (of incoming jobs) is 



(7.2.1) Pr[s=i] = f s (i) 



and 



s 
o 



(7.2.2) ^] f s (D 



i = l 
Let Q = {l, ...,k,... ,s Q } denote the set of queues. A job of 
size k is placed at the end of the k queue. 
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PCPS queues 




Figure 7-2. Sorting jobs into size classes in standby set, 



202 



In our work here, we require that a balance policy be fair. 
To accomplish this we have made the individual queues FCPS , and 
we will require that the scheduler keep at least one job from 
each queue in service. Thus, a job's waiting time will depend 
only on its order of arrival relative to jobs of the same size, 



and not on jobs of different size 



Associated with the k queue is a quantum q,. The quantum 



th 
q is assumed to be the same for each job in the k queue, re- 
gardless of its past. 

The scheduler controls membership in the balance set B so 
that balance is maintained; that is, so that the balance set 
demand (p R ,m ) is kept within close tolerances of the desired 
(oc,B). Because each job is assigned a quantum, its time in B 
is bounded, so the scheduler need only control entries to B 
(cf. Figure 6-3). 

A job (of size j) may exit the balance set B for one of 
three reasons : 

1. Its quantum expired, in which case it is entered at 
the end of the j queue. 

2. It disabled, in which case it enters the disabled set. 

3 . It qui t . 

In general, a job will fluctuate in size during execution. Thus, 
if it is of Size k at entry to B, it may be of size j#c upon 
exit from B. We assume a condition of statistical equilibrium, 
so that, on the average, a job of size k entering B implies that, 
within q, , some job of size k will exit B. 
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The arrival rate to the k queue is 
(7.2.3) a k = a f g (k) 

and, from eq. 7.2.2, 

(7 - 2 - 4) E a k - S a f s (k) = a 

keQ ksQ 

We may assume that - — is the mean of an exponential distri- 

1 k 
bution, because — is the mean of an exponential distribution and 

jobs are statistically independent. The service rate for the 

k queue is b^, and — is the average service time for jobs in 

■f~V» i 

the k queue. We assume that — is the mean of an exponential 

D k 
distribution. 

The analysis of Section 7.1 can be used to estimate the 

behavior of each queue when each receives service independently 

of the others and one job at a time is serviced from each. In 

reality, more than one job from each queue may be in service, 

in which case the analysis of Section 7.1 must be interpreted 

as the worst-case behavior. 



204 



Theorem 7.3 . Suppose B acts as a single server to each of the 
s queues in Figure 7-2. Let a fc be the arrival rate to 

O i, 

the k th queue, b be the service rate of the k queue, 



and let r = a, /b, . Then the expected processor and memory 
demand of the balance set B are: 



ksQ 



(7.2.5) 



ksQ 



k r k 



Proof : using the result of Theorem 7.1, let 

a 

n - Pr[k th subsystem is empty] = 1 - t— = 1-r ]c 

ok k 

where the k th subsystem comprises the k th queue and the single 
job from it in service. Define the random variable y k : 

r . . th . . .__,.... — r^_ XJ _ J ._„, ok 



Yi 



1 if k th subsystem non-empty: Pr[y k =l] - 1 ~ ll c 
otherwise Pr[ Yk =0] = 7i Qk 



Thus , 



Pr[y k =l] = r k 
Pr[ Y]c =0] = 1 - r k 

Then the random variable p B of processor demand is 

2N 



P B 



keQ 



and the random variable m B of memory demand is 



Z* 



m B = Z, K Y k 

ksQ 



205 



The expectations are: 



S Y k = V Pr[ Y]c -l] = ^ 



P B - Z, Y k = > PrL Y]c -U = 2, r k 
keQ keQ keQ 



£ k Yk - £ k Pr[ Yk =l] = ^ 
ksQ keQ keQ 



QED. 



There is an interesting special case, in which the running 
time of a job of size k is inversely proportional to the prob- 
ability f (k) : 

s 

(7.2.6) (mean running time), = r— = . . ,. . 

k f s 

for some constant b. This behavior may in fact occur in some 

real situations. For example, the quantum q. could be chosen 

to be: 

(7 ' 2 ' 7) « k ' b-fTio 

In this case, 

a a f (k) 

(7.2.8) r -L. = ^r- = ^ J,, v = r- = r 
k b fc b f g (k) b 

That is, r. =r is constant for all the queues. 



Theorem 7.4 . Suppose the conditions of Theorem 7.3 and eq. 7.2.7 
hold. Then the expected processor and memory demands of 
the balance set are: 

P B = s o r 

(7.2.9) , n , 

- s o (s o +1) 
m B = § r 



206 



Furthermore, when demand is not too high, that is r«l 
(the queues are sparsely populated), then 



s 2 
(7.2.10) m B « 2 2 " 



Proof : Eqs. 7.2.9 follow directly from eqs . 7.2.5 with r k =r. 
If r«l, then rs o «s Q , and we have 

s (s +1) (s r)(s +1) s 
o o _ _ o o <<: _2. 
m B = g r " 2 <K 2 

QED. 



It is interesting to note that, when s q is large, the distri- 
bution f (i) may be approximated by a continuous density func- 
tion f (u) if we regard the range [1,s q ] of job sizes as being 
continuous. In this case we may regard the set of queues, Q, 
as being a continuum of queues, and use the notation Q(u) to 
denote the queue into which jobs of size u are arriving. The 
arrival rate to Q(u) is : 

(7.2.11) a u = a f g (u) ue[1,s o ] 

Let b denote the service rate of size u jobs, and then 

(7.2.12) r u = ^ 

By analogy with eqs. 7.2.5, 

r S o 
P B = 'o r u du 

(7.2.13) 

m B = -C U r u dU 



Again, if r =r is constant, we obtain the same re 



suits as eqs. 7.2. 
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7.2.2. The Logarithmic Queue 

When the size s of the largest job is large, it may become 
impractical to implement a large network of queues, such as that 
of Figure 7-2. Indeed, when demand is not too high, the prob- 
ability that queue k is non-empty is small; such a queue struc- 
ture would comprise a mostly-empty set of queues. 

A general approach to the problem of reducing the number 

of empty queues is to establish classes of comparable-size jobs, 

and to sort jobs entering the standby set into a system of FCFS 

queues, one for each class. Suppose the number of classes is 

chosen to be K. Then we must choose K size-intervals (s, , , s, ) 

k-1' k 

in order to define the classes S, :' 

k 

(7.2.14) S, = (s If ± L th f f ize ° f , some J°M 

k L I in the interval (s ,s ) J 

Figure 7-2 may be regarded as s classes with S = {k}, and 

if we choose K<s q , it is not hard to see that the total expected 

balance set demands (p B ,iii B ) will be smaller (under the conditions 

of Theorem 7.3), because more work will be allowed to pile up 

in the (smaller number of) queues. 

One method for choosing the boundaries of the classes S, 

k 

is to make the arrival rate into each class be the same: 

(7.2.15) a k = | = £ a f^i) 

i£S k 



where f (i) is the probability Pr[s=i] . A much more interesti 



s 



ng 



method of sorting, the logarithmic queue , has particularly use- 
ful properties. 

The structure of the logarithmic queue is shown in Figure 7-3. 
Jobs are sorted by size into one of [log^ s ] FCFS queues (here, 
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the notation [x] means the greatest integer i<x). The classes 
are defined to be 

(7.2.16) S k = {s js 6 (2 k ,2 k+1 )} ksQ 

where Q = {l, . . . ,k, . . . ,[log 2 s Q ]} is the set of queues. When 
a job of size s enters the standby set, it is placed at the end 
of queue k = [log- s]. 

With the k queue is associated a pair (<J k »s k ), q k being 
the time quantum and s. being the typical size of a queue k job. 

The probability distribution f (i) for jobs in class S fc is 



f (i) 
s 



(7.2.17) f (i) = < 
5 k 



y f s (j) ifiss k 

otherwise 



The average job size in class S k is : 

(7.2.18) s k = V if s (i) 

ieS k k 
and we may regard s. as being typical of the jobs in class S k - 
The arrival rate to the k class is: 

(7.2.19) a k = V a fg(i) and V a fc = a 

ieS k keQ 

Again, — is assumed to be the mean of an exponential distri- 
bution. The service rate for the k class is: 



(7.2.20) b k = V b(i) f g (i) 

iss k k 

which is the average over S k of the rate b(i) of each job size 
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i in S, . Again we make the approximation that r— is the mean 

k 
of an exponential distribution, so that we may use Theorems 7.1 

and 7.2 to provide upper bounds on queue lengths and waiting times, 



Theorem 7.5 . Suppose the logarithmic queue structure described 
above is used, and that one job from each class is in ser- 
vice. Then the processor and memory demands of the balance 
set are bounded by: 

p D < [log„ s ] (single-process computations) 



(7.2.21) 



m n < 2s 

B o 



where s is the size of the largest job. 



Proof: If one single-process computation at a time is in B 
from each class, then at most [log_ s ] processes can be de- 
manding a processor. In class S. the largest job is of size 
(2 k+1 -l); then, 



log 2 s Q log 2 s Q 


(log 2 s Q )-2 


» B * E (2k+1_i) < S 2k+1 = 

k=l k=l 


k=0 



or, 



/Y i0g 2 V" 2 _ ± \ 



m B < 4 2 - 1 < 2s Q 



QED, 



Thus, a memory of size 2s , a logarithmic queue, and 
[log- s ] processors are sufficient to guarantee service to 
one job from each class. 
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The advantages of the logarithmic queue are: 

1. Fairness . One job from each class S fc is guaranteed 
service, and jobs from each class are serviced in order 
of arrival. 

2. Ability to scale . The boundaries of the classes S k are 
invariant to s , except for the upper boundary of class 
S K , K = [log 2 s q ]. 

3. Small number of classes . Unless s q is small, it is true 
that [log„ s ] « s . 

4. Small processor and memory requirements . From Theorem 7.5, 
no more than 2s pages of main memory are needed, and 

no more than [log_ s ] processors are needed, to accom- 
modate the balance set B. 

5. Flexibility . Suppose an imbalance of size s appears 
in the balance set memory demand, and that queue j 
is the queue in which size s jobs reside. If the 
scheduler finds queue j empty, it may still satisfy 
the imbalance with 2 jobs from queue (j-1), or 4 jobs 
from queue (j-2), etc. 
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7.3. Mathematical Programming Problem, One -Dimensional Case 

We shall formulate mathematical programming problems whose 
solutions are balance policies. 

We are require three things of balance policies : maintenance 
of balance, fairness, and the ability to satisfy other objectives 
such as minimum response time. Maintenance of balance is achieved 
by the constraints in the problems, fairness is achieved by making 
the mathematical programming problem operate in conjunction with 
queue structures of the types discussed in Section 7.2, and other 
objectives may be expressed as objective functions in the pro- 
gramming problems. We leave the particular objective function 
unspecified, the final choice being up to the policy designer. 

In the remainder of this section we formulate the problem, 
review alternatives for the objective function, prove a theorem 
that constrains the choice of quanta, present a solution in 
the case of memory balance with minimum-response-time objective, 
and finally discuss briefly why the formulations cannot lead to 
completely optimum solutions. 



7.3.1. The Problem 

A decision point is a real time instant at which the sched- 
uler is called on the rebalance the system. Suppose that the 
balance set demand is (p B ,m B ) at a decision point. Define the 
imbalance in B to be: 

(7.3.1) ^p' 5 ^ = <a,B)-(p B ,m B > = <a-p B ,B-ni B ) 
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We assume that the scheduler is called on to admit jobs 
to B, never to remove jobs from B. On the one hand, since a 
job's quantum bounds its time in B, the demand (p ,m ) must 
eventually fall below (a,B). On the other hand, the para- 
meters a and B are chosen to leave (l-ot)N processors and (l-B)M 
memory pages available for unanticipated expansions. 

When do the decision points occur? This is basically a 
decision to be made by the policy designer. Possibilities are: 
decision points occur at regular, clocked intervals,- they occur 
whenever a j ob exits the balance set B; or they occur whenever 
the imbalance exceeds some threshold. 

Define the following parameters: 

Q - {l,2,...,K} is the set of job-class indices. 



n 



number of jobs from class S selected by the scheduler 
to enter B. 



B 
n 



number of jobs from class S, already in B. 



s - typical size of job in class S, . 

q = quantum assigned to jobs in class S, . 

cc,B - balance parameters. 

N,M =- number of processors, number of main memory pages. 

A = standard interval used to define processor demand 
in the single-process case (Section 6.2.2). 

r] = minimum tolerable duty factor for each computation. 
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Problem definition . Find integers {n.l. Q such that the 
objective 

F(n 15 ... ,n R ) 
is extremized, and the constraints 

(7.3.2) (n k + n* > l} keQ 



(7.3.3) V n, s, < M 6 B 
keQ 



(7.3.4) ^ n kqk < N_A 6 



keQ 
are satisfied. 



The choice of objective function F(h, , . . . ,n„.) is discussed 
below. The constraint of eq. 7.3.2 means that at least one job 
from each class shall be in service. The constraint of eq. 7.3.3 
asserts that the total memory requirement of jobs admitted to B 
shall not exceed the imbalance of M6 pages of memory. The con- 
straint of eq. 7.3.4 asserts that the total processor require- 
ment of jobs admitted to B shall not exceed the imbalance of 

NAcB 

— — - o processors. We have divided by the duty factor r\ because 

if each job has minimum duty factor r\, then N processors may 

N 
appear as — processors j see Section 7.3.3. 



7.3.2. The Objective Function 

The objective function F(n 1 ,...,n K ) which is to be extre- 
mized (i.e., maximized or minimized) is to be specified by the 
policy designer. Some possibilities are, in order of complexity: 
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1. Minimize to bal wdi ting time . At a decision point, let 
N, denote the number of jobs in the k queue. Then 
the wait of the job at the end of the k queue (before 
entering service) is (N -n )q . The objective becomes 

minimize F(n , ...,n ) = ~> (N -n )q 

keQ 

N k^k - Yi "k^ 
keQ ksQ 

but N q, is a constant at a decision point, so we have 

the simpler objective 

(7.3.5) maximize > n, g, 

Z_i k k 

keQ 

We shall use eq. 7.3.5 as the expression of a minimum 
response time objective. 

2. Minimize weighted sum of waiting times . Let c, , . . . ,c 
be a set of weights (relative importances) of the 
waiting times in each queue. Then (by analogy with 
eq. 7.3.5) the objective becomes 

(7.3.6) maximize > c, n, q, 

/ | k k M k 

keQ 

3 . Minimize weighted sum of functions of waiting times . 
Let g,,...,g be a set of (cost) functions associated 
with the wait of the job at the end of each gueue , and 
c,,...,c„ are weights. Then the objective is 

(7.3.7) minimize ")> c g (N,-n, ) 

ksQ 

These alternatives are meant only to illustrate possibilities, 
not to exhaust them. 
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7.3.3. Choice of Quanta 

Efficiency requirements place an important constraint on 
the value of the quanta that may be chosen. 



Theorem 7.6 . Suppose ti is given, where 0<n <1, and we want the 
duty factor r) to satisfy t)>_t) for all jobs, regardless of 
size. Then the quantum must be at least linearly propor- 
tional to the job size. Furthermore, not every value of 
i") in the interval [0,1] is attainable for a given choice 
of the working set parameter T. 

Proof : Let s be the size of jobs in the k class, and q, be 
their quantum. Let \(t) be the missing-page probability (Sections 
4.2 and 5.4). In a virtual time interval of length q. , the 
process encounters \(T)q, page waits. In addition, at the 
start of the quantum q, , the working set must be demand-paged 
into main memory (assuming it is not already there), requiring 
an additional s, pages waits. Therefore the duty factor across 
the quantum must satisfy: 

q k 
(7.3.8) ti = ■ > tj 

q k + \(T)q k T + S] T ° 

Solving eq. 7.3.8 for q we find 



k 'o 



% > 



1 " % " V MW 
For the given r\ , in order that q be finite, we must have 



(7.3.9) 1 - T) - T] \(t)T > 

(For example, n o =0.5 requires \(t)T > 1, and we must have 
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\(t)T»1 if q, is to be reasonably small). Once i"| and i have 



been fixed, the quantity 



ri T 
'o 



C 
o 



'o o 



is fixed, and we have 

q. > C s. 
1 k — ok 

which was to be shown. In other words, if c Iv <c s k ) the actual 

value of i) (eq. 7.3.8) cannot satisfy 'n> r l • 

To show that not every value of r\ in the interval [0,1] 

is attainable for a given choice of t , let T be given and solve 

eq. 7.3.9 for ri : 
1 'o 



1 



° 1 + \(T)T 



Thus, T] is upper bounded. Compare with the result of Theorem 
4.6, which gives this expression as the steady state duty factor 
when quantum starts and expirations are ignored. 

QED. 

Theorem 7.6 tells us that if we wish to achieve a certain level 
of processing efficiency, we must be willing to associate larger 
quanta with larger jobs. 
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7.3.4. Solution to the Memory Balance Problem 

The most important one-dimensional case is the memory- 
balance case, because it finds application immediately in con- 
temporary computer systems. In this case we place much emphasis 
on balancing memory (to avoid thrashing) and little emphasis on 
balancing processor. If the objective function is to minimize 
response time, there is a rather elegant algorithm to find the 
solution to the mathematical programming problem. 

The linear programming problem presented in Section 7.3.1 
very much resembles a classic problem, the knapsack problem [DO], 
We are given a collection of objects, each having a certain 
weight and a certain value, and we are to pack them into a knap- 
sack such that a given weight limit is not exceeded and the total 
value of objects packed is maximum. The solution to this prob- 
lem gives insight into the nature of the solution to the memory- 
balance problem. Formally stated, the knapsack problem is: 



The Knapsack Problem . Let Q be a class of object types, N, be 

the number of objects of type k, w, be the weight of a type 
k object, and v be the value of a type k object. We are 
to find integers { n ^.} kEO such that the objective 



maximize 

ksQ 



2 n k v k 

ksQ 
is achieved and the constraints 

(0 < n k < NjJ, 



>keQ 



E- 



k w k < W 
keQ 

are satisfied, where W is a given positive number. 
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Theorem 7. 7 . Suppose in the knapsack problem we have ordered 
the elements of Q such that 

v l v ? v k 

— > — >...>— >... kEQ 

w l w 2 W k 

and suppose no two classes have the same — ratio. Then 
c c w, 

k 

the optimum solution is : 

for k-1, 2,3,... do: 

choose the largest n such that 0<n <N, and 



keQ 



k w k < W 



Proof : The details can be found in Dantzig [DO] , but the idea 

is very simple. The ratios — may be interpreted as the value 

k 
per pound weight of each object. The algorithm merely attempts 

to reach the weight limit W by packing in all the objects with 

the highest values per unit weight. 

QfD. 



In the general case, we would have to assume that the 
ratios satisfy 

v l v ? V k 

— > — > ... > — > ... kcQ 

w l ~ W 2 ~ ~ W k " 

instead of the strictly decreasing situation given in Theorem 7. 7. 
This complicates the algorithm. Since we are about to apply 
it to the memory balance problem, which will satisfy a constraint 
similar to that in Theorem 7.7, we shall not concern ourselves 
further with refinements of the algorithm. Additional solution 
methods are found in Dantzig lDO] . 



220 



The Memory Balance Problem. Let Q be a set of queues, as dis- 
cussed earlier. Let s. be the typical size of jobs of type 
k, q, be the quantum associated with type k jobs, and N 
be the number in the k queue at a decision point. We 
are to find integers {a j, Q such that the objective 

maximize N n q, 
keQ 

is achieved and these constraints are satisfied: 
(0 < n k < Nj k£Q 

W + n k * ijl 



'keQ 

£ B 

keQ 



n, s, < M 6 
k k — m 



where M is the main memory size, n, is the number of type k 
jobs in the balance set B, s, is the size of a type k job, 
and 6 is the memory imbalance of B. 



The objective function, which minimizes response time, is the 
same as eq. 7.3.5. There is one constraint more than in the 
knapsack problem, namely that the balance set B shall contain 
at least one object of each class. There is no explicit proc- 
essor balance condition because we assume a sufficiency of pro- 
cessors. We shall see in Chapter 8 that we can properly match 
processor and memory resources to the given program mix; thus 
we may suppose there are enough processors as long as we abide 
by the memory constraint and do not change the program mix. 

The solution to the memory balance problem is given in the 
next theorem. 



/ 
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Theorem 7.8 . The solution to the memory balance problem is as 
follows. Let Q . = {i,2,...,k} be the set of queue indices. 
Choose the quanta q. to satisfy 

q K Hi Ik 

S K S 2 S l 

Let R = {keQ | n^=0 } = {k 1 ,k 2 , . . . ,k r Jk^k^. . .>k r >. Then: 

1. for j=l,2,...,r do_: 

J 
if V s. > M6 B then goto step 3 else n k =1; 

i = l 

2. for k=K,...,2,l do: choose the largest n k such that 

< n k < N k 



E n k S k ^ M6 m " Z n i S J 



ksQ jeR 

3. done. 

Proof : Follows at once by analogy with Theorem 7. 7. 

QED. 

In words, Theorem 7.8 says: first satisfy the one-job- 
from-a-class constraint, then keep on admitting the largest 
jobs possible until memory is full. If step 1 fails to admit 
jobs to the balance set, step 2 is bypassed; thus, resources 
are reserved for these jobs, in readiness for the next decision 
point . 
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7.3.5. On the Optimality of the Solutions 

At each decision point the scheduler finds a solution to 
the mathematical programming problem; but this solution is op- 
timum only with respect to the decision point at which it was 
made. 

Put another way, if the scheduler had a complete listing 
of balance set program sizes, together with their completion 
times, it might very well want to make a decision different from 
that which satisfies the mathematical programming problem. A 
decision which appears to satisfy the objective function at time 
t ± may turn out to be poorer across an interval (t ,t„) than a 
decision which appears not to satisfy the objective function at 
time t . 

Thus, all we can claim about these mathematical programming 
formulations of balance policies is that they produce solutions 
which are optimum (with respect the the given objective function) 
across short time intervals, but not necessarily across long time 
intervals. 

We do not feel that this is a serious difficulty. The main 
function of these policies is to keep the computer system balanced 
under the given criteria of fairness. The objective functions 
incorporated into the mathematical programming problems are there 
to accomplish ancillary objectives, namely those beyond balance 
and fairness. Thus, it is not of major importance that the policy 
is only locally optimum with respect to the objective function. 

Of far greater import is: it is possible, under fair balance 
policies, to establish reasonable policies with respect to cri- 
teria such as minimum response time, without requiring exorbitant 
amounts of processor and memory resources. 
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7.4. Mathematical Programming Problem, Two-Dimensional Case 

We briefly generalize the ideas of the previous section, 
formulating the mathematical programming problem for the two- 
dimensional case, in which it is important to balance both pro- 
cessor and memory equally well. The formulation is very general 
and is presented in the continuous case. Each job in the standby 
set is now a multiprocess computation. 

Consider again the demand space V, illustrated in Figure 7-4, 
where the region Q_ in the unit square is regarded as being a 
continuous two-dimensional queue. A demand is a point (u,v); 
demands may appear only in the region _Q. The demand density 

function f (u,v) is two-dimensional; that is, the probability 
pm 

a demand (p,m) falls in a differential region of area (du dv) 
at the point (u,v) is given by (f Cu,v) du dv) , and 

(7.4.1) ff f (u,v) du dv = 1 

Q P "' 

Again, a is the (exponential) arrival rate of demands into 
the standby set. The rate to the queue at the point (u,v) is 

(7.4.2) a(u,v) = a f (u,v) 

The rate at which jobs leave the queue at the point (u,v) is 

(7.4.3) b(u,v) 

Therefore 

a( u, v) 

(7.4.4) PrCqueue (u,v) non-empty] = r(u,v) = b(u v) 

following Theorem 7.1. Assuming that each queue is treated 
independently, under a FCFS policy, the expected processor demand 
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incoming job 



queue at 
point (u,v) 




DEMAND SPACE V 



Figure 7-4. Demand space as a continuous two-dimensional 



queue. 
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of the balance set is 

(7.4.5) p R - // u r(u,v) du dv 

B Q 

and the expected memory demand of the balance set is 

(7.4.6) m R = // v r(u,v) du dv 

These equations are obtained by noting the expected contribution 
to processor demand from the queue at (u,v) is 

(7.4.7) u(Pr[queue (u,v) non-empty] j = u r(u,v) 

and the expected contribution to memory demand from the queue 
(u,v) is 

(7.4.8) v ( Pr[ queue (u,v) non-empty]! = v r(u,v) 

In the special case that r(u,v)=r is constant everywhere in Q: 

(7.4.9) P B = m B = j 

To set up the mathematical programming problem, we define 
the following quantities : 

n(u,v) = number of jobs to be chosen from queue (u,v) at 
a decision point to enter the balance set B. 
Note that n(u,v)>0 is a continuous distribution. 

q(u,v) = quantum to be allocated to a job from queue (u,v) 
We assume q(u,v) depends only on (u,v). Since 
a job at (u,v) is a multiprocess computation, 
q(u,v) represents the total virtual time alloc- 
ated, in a pool, to all the processes in the 
computation. 

u = processor demand at queue (u,v). 

v = memory demand at queue (u,v). 

w(u,v) = waiting time of job at end of queue (u,v) until 
it enters service. 
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g(u,v,x) = cost function associated with queue (u,v) 
when the waiting time there is x. 

B - B 
(6 ,& m ) = (oc.B)-(p ,m ) is the degree of imbalance. 



The Problem . Let (p ,m ) be the balance set demand at a dec- 
ision point. We are to find a distribution n(u,v) of jobs 
to enter B, such that the objective 

// n(u,v) g(u, v, w(u,v) ) du dv 

is minimized, and the constraints 

// n(u,v) u du dv < N 5 
Q p 

o 

//" n(u,v) v du dv < M 6 
£ - m 

are satisfied. 



We shall not attempt to discuss implementation issues here 
as we have done for the one-dimensional programming problem in 
Section 7.3.4. The solution n(u,v) to this problem is compli- 
cated by the path effect, discussed in Section 6.5.1. We leave 
this as an area of future research. 
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7.5. Summary 

The major result of this chapter is : the balance policy to 
be used the scheduler can be expressed as the solution to a 
mathematical programming problem. 

The solutions produced by these formulations are optimum 
with respect to the objective function across short time inter- 
vals but not necessarily across long time intervals. These 
policies are supposed primarily to be equitable balance policies, 
secondarily to insure reasonable policies toward criteria such 
as minimum response time; thus, these policies meet the objectives 
of this thesis work. 

We showed that there exists a simple, elegant algorithm, 
which finds the optimum set of jobs to admit to the balance set 
at a decision point, which is to be used in the memory balance 
case together with a minimum response time objective function. 
This algorithm is applicable in contemporary computer systems, 
when it is important to prevent thrashing. It is based on an 
analogy with the knapsack problem, a classic linear programming 
problem. 
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CHAPTER 8 



Applications to Computer System Organization 



8.0. Introduction 

One aspect of the study of tiie resource allocation problem 
has been to set up behavior models for computations in order to 
provide a framework within which we can understand misunderstood 
problems. An equally important aspect of the study is to examine 
how programmers, system designers, and the computer system itself 
might all cooperate in allocating resources. 

We shall discuss three seemingly disparate aspects of com- 
puter system organization. The first, the equipment configuration , 
is the relationship among the program mix, the amount of processor, 
and the amount of memory. The second, equipment pooling , is 
effecting large process or -memory capacity by sharing equipment 
at the finest hardware level. The third, multilevel memories . 
is showing how to make better use of memory resources. The 
relation among these three aspects is : each is concerned with 
matching the equipment to the work load. 
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8.1. Toward Better Programming and System Design 

There is every reason to believe that programmers can, 
by careful programming, create programs that run with small, 
compact working sets. They can do this, for example, by design- 
ing algorithms to work locally on information, and by employing 
data structures which induce highly local reference patterns. 
Programmers who cooperate in this way will be rewarded, for 
their working sets will be smaller, memory-usage costs lower, 
and running times shorter. Thus, a first guiding principle, 
for programmers, is to design programs to have small working sets. 

The remaining guiding principles, for system designers, 
are applications of programming generality (Section 1.2) and 
of our results here. 

Perhaps the single most degrading factor in contemporary 
computer systems is the inability to manipulate small quantities 
of information easily. Ideally, the unit of information storage 
and transfer, universally used throughout the entire computer 
system from the highest level of memory to the lowest, should 
be the word. This is simply not feasible in contemporary systems 
on account of the high cost of accessing an item in auxiliary 
storage. The commonly-used compromise is that of paging: each 
page comprises a block of words, the page size being chosen to 
represent a compromise among wasted memory, complexity of record- 
keeping (i.e., page tables, memory usage map), and cost of trans- 
ferring a page into main memory. And yet, the traverse-time cost 
is still so high that paged memory systems have been beseiged 
with poor performance. Thus, a second guiding principle for the 
system designer must be to make it convenient to manipulate small 
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quantities of information [let us aim for the word]. To do this, 
he must take recourse to parallelism in the data channels, and 
in the addressing and accessing mechanisms. 

Our behavior models have verified quantitatively the intui- 
tively obvious fact that sharing of equipment becomes more and 
more successful when there are more and more participants. This 
generates a need for a great number of processors and for a great 
deal of main memory (much of which is wasted in contemporary sys- 
tems because the large traverse time induces so many protracted 
page waits). On the one hand, multiprogramming has made it 
possible to effectively share the memory resource among many 
computations. On the other hand, however, it is not yet possible 
to achieve anywhere near complete utilization of processors. 
Instead, a processor is dedicated to a single process, only one 
instruction at a time being executed, and most of the equipment 
(adders, multipliers, etc.) in a processor is idle 1 . If, instead, 
the individual hardware components of several processors were 
placed in pools (adders, multipliers, etc.), it would be possible 
to overlap a great many operations. By making it accessible 
from pools on demand, the same equipment contained in one modern 
processor could be used to service simultaneously a surprisingly 
large number of processes. Thus, a third guiding principle for 
the system designer is to permit pooling of small hardware units. 

In summary, the computer system designer must at the very 
least be guided by these principles: programming generality, 
small-working-set programs, ability to manipulate small quantities 
of information, and ability to pool small hardware units. 



1„ 
Some processors, such as the CDC 6600 and the IBM 360/91, attempt 
to overlap operations by looking ahead a short distance in the 
instruction stream; but the monosequential nature of instruction 
streams makes it difficult to overlap more than a few operations. 
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8.2. The Equipment Configuration 

By the program mix we mean the collection of possible 
computations. By the equipment configuration we mean 
the proper relative choices of the number N of processors 
and the number M of main memory pages to achieve static balance. 
We shall show that, once any two of {program-mix, M,NJ are arbi- 
trarily given, the third is determined. 

The following is meant to indicate the kind of procedure 
that may be used to determine the equipment configuration; it 
is not meant to be the only possible approach. 

We assume that all jobs are single-process computations, 
that they are statistically independent, and that their working 
sets do not overlap. 



8.2.1. Choosing the Balance Parameters a and B 

The statistical properties of the program mix are the work- 
ing set size and the duty factor. 

We assume that the working set size co(t,T) is a stationary 
random process (cf. Sections 3.3 and 4.4), and we let T be under- 
stood. Thus, we may write co instead of co(t,T). The mean <o and 

2 
the variance have already been derived in Theorems 4.4 and 4.5. 

The duty factor r\ depends on the choice of working set 

parameter t, the size co of a job's working set, a job's quantum q, 

and the traverse time T, as follows. If a job is assigned a 

quantum q, it generates q information references. The steady 

state missing-page probability is Mt), so the job expects to 

encounter q\(T) page waits due to pages re-entering its working 

set. In addition, its working set must be demand-paged into 
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memory at the start of its quantum, requiring an additional co 
page waits, one for each page. The total expected page wait 
time is (q\(x) +u>)T. The duty factor is 

q 1 



(8.2.1) 



Let 



q + (q\(T)+w)T 1 + (\(T)+^)T 



(8.2.2) Y = \(T) + ^ 



so that 



(8.2.3) T) = 

1 + Y T 

For simplicity in the following discussion we assume that T) is 
the same for all jobs (thus, q=C u) for some constant C q ; cf. 
Theorem 7.6). 

We assume that, whenever a job in the balance set is not in 
page wait, it is running. In order that this be a good assump- 
tion, there must be sufficient processor resources that the 

probability 

p r~no processor available when - ] 
[_a process exits page wait I 

is arbitrarily small. We shall see shortly that this is the case. 
In this case, we may regard T| as the probability that a process 
is running. 

We now define two random variables : W is the total working 
set size of the balance set and P is the total processor require- 
ment of the balance set. We suppose there are n jobs in the 
balance set. Prom our discussion in Chapter 7, if K is the num- 
ber of standby set queues, then n must satisfy n>K. 
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Let to. be the working set size of the i job in the balance 
set. Then the total working set size W of the balance set is 



E»i 



(8,2.4) W = 

i = l 

Since the jobs are statistically independent and identically 

distributed, we have 



1,2, ... ,n 





CO. 

i 


CO 


(8.2.5) 








a 2 - 

CO. 

± 


a 2 

CO 


Then also 







W - nco 
(8.2.6) 

a, , = no 

W CO 

Define the binary random variable 

1 if the i job is running 
(8.2.7) % i 

' otherwise 

From the discussion above, 



Pr[ir. -1] 
(8.2.8) 



Pr[Tt.-0j = 1-t) 



l 



where r) is the duty factor. Since the jobs are statistically 
independent and identically distributed, 



r*. = -ji = n 
x 

(8.2.9) i = 1, 2, . . . ,n 

2 ? -? 

a = 7T - tt -- Tl( l-Tl) 

71. 

1 
The total processor requirement P of the balance set is 
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(8.2.10) 



and 



(8.2.11) 



Z". 



i = l 



n7T = nr) 



2 2 
a p = no % = nti(l--n) 

Now, let numbers s and e be given, with 0<e M <l and 
CKe^l. These numbers s N and e M represent the allowable pro- 
cessor and memory overflow probabilities. That is, we want to 
choose M and N such that 



(8.2.12) 



Pr[w > M] < s M 



Pr[P > N] < S N 



We must proceed carefully, because M and N are not independent. 
But before proceeding, we must indicate how Pr[W>M] and Pr[P>N] 
might be determined. 

The Central Limit Theorem tells us that the sum of n iden- 
tically distributed, statistically independent random variables 
becomes normally distributed for large n. We may therefore 
approximate the distributions of W and P by normal distributions 
(these approximations are surprisingly good, even for n as 
small as 10 or 20; see Feller [P2, Vol. 1, p. 168f f ] ) . That is, 
we approximate f w (u) and f (u) by 

^ * W L 2 4 J 



(8.2.13) 



fpCu) = — ^-ex P r-I^"| 
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and then 

Pr[W > M] = /" f w (u) du 
(8.2.14) 

Pr[P > N] = /* f p (u) du 

Therefore, given e and e N we can find M and N (using standard 
tables for the normal distribution, such as [F2, Vol. 1, p. 167]) 
such that 

•C V U) dU = G M 



(8.2.15) 



/" f D (u) du = e, 



and so relate M to s,„ and N to e„. . It is now a simple matter 
to choose M and N. 

Let the memory size M be given. Then choose the largest n 

2 
such that 

n 

(8.2.16) Pr[W > M] = Pr[^> ok > M] < s M 

i = l 

Using this value of n, find the smallest N such that 

n 

(8.2.17) Pr[P > N] = Pr[VTt ± > N] < s N 

i=l 



The normal approximation is not the only way. For example, 

the more powerful Chernoff Bound [W2, p.67ff] shows that, given 

random variables z.>0 with common density function f (u), there 

exists a decreasing function h(A) depending only on z f (u) such 

that z 

n 

Pr[Vz. > nA] < (h(A)) n with 0<h(A)<l 

i = l 

2 

In Chapter 7, we required n>K, where K is the number of standby 

set queues. Here we assume that M and N are large enough so that 
the largest values of n satisfying eqs . 8.2.16 and 8.2.17 also 
satisfy n>K. 
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* It should be clear from eqs . 8.2.16 and 8.2.17 that, once 
any one of the three quantities {n,M,NJ has been arbitrarily 
given, the other two are uniquely determined (all other things 
being equal). Thus, it makes no difference which of {n,M,N} is 
chosen first. 

To choose a and B, let {n,M,N} be chosen as above, and 
set B such that 

(8.2.18) BM = W = no) 
and set a such that 

(8.2.19) cxN = P = ni = nr) 

The procedure discussed above is a worst-case procedure, 
for the following reason. In a real computer system, the values 
of a and B so selected are lower bounds on the actual values that 
may be used without violating the probabilities s M and E... That 
is, the values of a and B actually used may satisfy 

(8.2.20) 

^ < B < 1 

M — — . 

The reason for this is: the scheduler carefully regulates the 
membership of the balance set, dynamically maintaining W within 
close tolerance of BM and P within close tolerance of ocN. The 
procedure just described takes no account of this additional cer- 
tainty, that W is close to BM and P is close to ocN. Thus, the 

2 
actual variance of W is less than cr (eq. 8.2.6) and the actual 

2 
variance of P is less than cr p (eq. 8.2.11). There is more free- 
dom to choose larger a and B. 
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8.2.2. How Much Resource Slack? 

We refer to the reserve (l-B)M pages of memory as the 
slack memory , and the reserve (l-a)N processors as the slack 
processor . We want to show that, as M and N are increased and 
S M and s n are held fixed, that the relative amounts of slack 
resources become negligible. 

Theorem 8.1. Suppose e^ and s M are given, a and J3 are determined 
according to the procedure above, and we let the number n 
of jobs in the balance set increase without bound (appro- 
priately adjusting M and N to satisfy e M and e ). Then 

a -*1 
B-*l 

Proof.: We show that B-»\L, since the proof for a-*l is exactly 
the same. Since 0<B<1, it is enough to show: 

Refer to Figure 8-1, where we have plotted memory usage W, showing 
it to be normally distributed with mean BM=noi) and standard dev- 
iation a w =Vna w . It is well known that, given s M , the probability 
Pr[W>M] depends only on the distance between M and BM. That 
is, there exists a fixed constant b>0 such that 

Pr[W>M] = Pr[BM+b0 w > M] 
Then, as n-»o» 

i=£ = <1tB?m = ^w. . ^U . ^_i_ 

B BM — — n — .i — 

n a) a) a) Vn 

QED. 
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M 



Figure 8-1. Memory usage. 
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8.2.3. Relations Among Processors, Memory, Traverse Time 

There is an important three-way relationship among the 
number N of processors, the number M of main memory pages, and 
the traverse time T. If any two of these quantities are given, 
the third is determined. 

Theorem 8.2 . Define BM to be the expected amount of memory to 
match aN processors. Then 

(8.2.21) BM = ocNo)(1+yT) 

where co is the expected working set size, T is the traverse 
time, and 

(8.2.22) y = X(T) + ^ = \(t) + i- 
1 q C 

^ o 

has been discussed at eqs . 8.2.2 and 8.2.3. 

Proof : Let {n,M,N} be chosen as discussed in Section 8.2.1. Then 

W = BM = nco 

n 
P = aN = nr) 



1 + Y T 

where i) is the duty factor, given by eq. 8.2.3. Eliminating n 
between these two equations, we have 

n = ocN(1+yT) 

so that 

BM = noi = ocNco(1+yT) 

QED. 
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Since Theorem 8.2 depends on average value arguments, it 
is only an approximation to the actual behavior. Put another 
way, we may only regard the relation BM=aNu)( 1+yT) as stating a 
necessary, but not sufficient, condition on the hardware confi- 
guration. However, the discussion in Section 8.2.1 shows that 

we can reqard £,„ and e. T as confidence levels for this result. 
^ M N 

The relation BM=ocNuJ( 1+yT) gives further insight into the 
causes of thrashing (Section 3.6). Recall that large values of 
T make the duty factor (and hence the attainable processing effi- 
ciency) very sensitive to small changes in the missing-page 
probability (here, represented by y). In Figure 8-2 we have 
indicated the behavior of the processor -memory ratio: 

BM 

(8.2.23) R = — - = (1+yT) 

OCNU) 

for T=l, 10, 100, 1000, and 10000 vtu. It is clear that, when y is 

small and T is large, the slope of R is guite steep. Small 

fluctuations of y can result in wild fluctuations in R. Thus, 

if R ,=7jto( 1+yT) is regarded as representing the desired processor- 

memory ratio, and R =tt is regarded as representing the actual 

processor-memory ratio, then these fluctuations in y cause 

R, to be seriously mismatched to R . 
d J a 

In Figure 8-3 we have shown that the expected amount BM 
of memory grows linearly. Indeed, for large values of T, 

(8.2.24) BM JS aNuT 

Were we to reduce T by an order or two of magnitude, we could 
also reduce the main memory requirement by as much as an order 
or two of magnitude, without sacrificing efficiency. 
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Figure 8-2. Desired processor-memory ratio. 
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Figure 8-3. Relation among processors, memory, traverse time, 
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The relation BM=aNu5( 1+yT) shows that £M can increase if 
ai>d only if aN increases, all other things being equal. Thus, 
if a N<aN processors are available, then for some £ <^i 
fi M=a Ncud+yT), and (£-£ )M main memory pages stand idle (that 
is, they are in no working set). Similarly, if fi M<£M main 
memory pages are available, then for some a <a, £ M=a Nu>( 1+yT) , 
and (oc-oc )N processors stand idle. A shortage in one resource 
type inevitably results in a surplus of the other. 

All our claims that large traverse times degrade performance 
and strain system resources can be substantiated. Fikes , et al. 
[F5] and Lauer [LI] report on their experience with the IBM 360/6 7 
Time Sharing System at Carnegie University, in which they replaced 
the drum auxiliary store with large (bulk) core storage . For 
their system, the drum traverse time is about 10 times larger 
than the large core storage transfer time. They report that 
throughput was increased by about a factor of 10 when the large 
core replaced the drum. This supports our remarks concerning 
eq. 8.2.24. Since a considerable amount of main memory space 
was reserved for use as buffers for the drum, removing the drum 
made a large quantity of additional memory available. 

Lauer [LI] points out that the equipment rental and main- 
tenance costs were about 10 per cent higher after the large core 
storage was introduced. Since, however, the system capacity is 
effectively 10 times greater with large core storage than without 
it, the increased size of the market (user community) more than 
offsets the increased cost. 
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8.3. Pooling 

Equipment pooling can remarkably enhance throughput. To 
verify this, we consider the conceptual experiment shown in 
Figure 8-4. 

Theorem 8.3 . Suppose n requestors with identically distributed 
demands seek to use a given type of resource. Compare two 
cases: first, each requestor is given a private supply of 
resource; second, each requestor draws on demand from a 
pooled supply of the resource. In order than the probability 

71 = PrCgiven requestor fails to obtain required resource] 

be the same in both cases, at least Vri times as much resource 
is needed to provide private supplies as is needed to pro- 
vide a pooled supply. 



Proof : Each of the n requestors requests a random variable y.=y 
of the resource, independently of the others. For simplicity 
assume y=0. Then 

cf 2 = y 2 - y 2 = y 2 = / u 2 f (u) du 
Y |Y| >0 Y 

but 

f u 2 f (u) du > / u 2 f (u) du > R 2 / f (u) du 

|y| >0 Y |y| >R Y |y|>R Y 



thus , 



(8.3.1) a 2 > R 2 Pr[ lyl > R] 
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Requestors 




CASE 1. Private supplies of resource. 




nR, 



CASE 2. Pooled resource. 



Figure 8-4. Pooled vs. Private resource supplies 



If 



then 
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Y = 



E^i 1 



i = l 



_2 2 
a,, = no 
Y y 



and eq. 8.3.1 becomes 



no" 



(8.3.2) 



Pr[Y > R] < 



Let s >0 be given. In Case 1 (Figure 8-4), the probability n 

of the theorem statement becomes, using eq. 8.3.1, 

a 2 
ti = Pr[ |y| > R-,] < -£ = e 

R l 



a 



and 



no 



(total resource in Case 1) 



nR, 



In Case 2, suppose (n-1) requestors have made their requests, 
and then the n request arrives, his request being y =u. Then 
he fails to obtain his request just when 
n-1 



I l*il * 



nR 2 - |u| 



i = l 

Therefore the probability it of the theorem statement becomes 

n-1 

TU 

"i = l 
n-1 



/ ( Pr [X lyj > nR 2 -|u[j) f y (u) du 

i = l 
n-1 



/ Pr 



u > nR, 



i-1 



f (u) du 
Y 
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n-1 



Pr (2 |Y i |+|Y nl *■ ^ 

n a 2 



p {E ivii * nR 2 ] * - 



-1=1 - (nR 2 )2 



*- - s 2 



from eq. 8.3.2. Then 



and 



Finally, 



2 fZ e 



a 

(total resource in Case 2) =■ nR„ = Vn" — ^ 

2 6 



n 



(total resource in Case 1) e 



a 



(total resource in Case 2) 



a 



VTT 



QED. 



It is therefore quite clear that sharing and pooling can 
significantly increase the usable capacity of a given amount 
of equipment, especially when n is large. 

It is one matter to realize that pooling at the finest 
level of detail is beneficial, but it is quite another matter 
to implement it. For pooling to operate without unnecessary loss 
of speed, it is necessary to dispense with a centrally clocked 
computer system and to rely wholly on asynchronous logic. Luconi 
[L2] has studied some of the rather delicate issues attending 
this problem. 
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Existing techniques make pooling of memory resources pos- 
sible, but they have not yet made pooling of processing hardware 
possible at an equivalent level of detail — many programs can 
reside in one memory unit, but only one process can use a pro- 
cessing unit at a time. 

The problem of pooling the memory resources can be very 
effectively solved by paging. The smaller the page size, the 
better the pooling. Unfortunately the long traverse times that 
predominate in contemporary computer systems make it just as 
expensive to move a small page as a large one. These systems, 
therefore, have been forced into using large page sizes and have 
not always performed as well as expected. Since physical limit- 
ations make it impossible to reduce access times of rotating 
storage devices to the required levels, we must turn to non- 
rotating storage devices and rely increasingly on parallel data 
channels and asynchronous logic in order to effect completely 
successful memory pooling. 

Therefore, the potential for effective pooling of memory 
resources already exists in contemporary computer systems. 

It is not the case in contemporary systems that a potential 
exists for achieving the degree of processor pooling needed. 
There are three reasons for this. 

The first reason is complexity of interconnection . In 
order to satisfy objectives of reliability, expandability, and 
programming generality, it has been standard practice to allow 
each of the (say) n processors free and unrestricted access to 
each of the (say) m main memory modules, as indicated in Figure 8-5. 
It is not hard to see that the complexity of the interconnection 
grows exponentially as (mn) , whereas the processor-memory capacity 
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processors 



memories 




Figure 8-5. Full interconnection of processors and memories, 
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grows linearly as (m+n). Indeed, realizing the required large 
number of processors and memory units may not, using full inter- 
connection, be at all feasible. Pooling will have to occur at 
a much finer level. 

The second reason is the large amount o f information needed 
to specify a computation . In Multics, for example, a myriad of 
tables and lists are needed in order to completely specify a 
process's name space and to allow it to be interrupted at ar- 
bitrary times and yet be properly restarted. These tables have 
two deleterious effects. First, it is expensive to switch a 
processor between processes, partly because of all the infor- 
mation that must be loaded into the processor registers, partly 
because of operating system scheduling functions. Second, the 
tables that must be loaded into main memory while a process is 
active occupy considerable space and reduce the memory space 
available to a program's working set. Unless all this software 
complexity is rooted out, it will remain impractical to implement 
pooling of hardware at a fine level. At the end of this chapter 
we shall disucss a highly organized name-space information struc- 
ture that may one day produce a solution to these problems. 

The third reason is lack of parallelism in the hardware. 
Processor pooling implies considerable process activity, which 
in turn implies considerable information movement. Parallelism 
on the data channels between levels of memory and in the addres- 
sing hardware is needed if the memory system is to be capable of 
handling the information flows induced by busy processor hardware. 

Therefore, although there is much work to be done on memory 
system organization and the structuring information, there is 
even more work to be done on basic hardware design so that the 
required degree of processing can be achieved. 
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8.4. Multilevel Memory Systems 

A memory hierarchy, or multilevel memory, is a sequence of 
increasingly capacious and successively slower-access memory 
devices. Its general organization is shown in Figure 8-6. 

There are n main levels , M , . . . ,M , and m auxiliary levels , 

A , . . . ,A . Each main level device may be addressed directly by 

a processor. Information residing in auxiliary devices must be 

moved into a main level (namely, M ) before it can be referenced. 

2 ' n 

We assume that information can migrate only between adjacent 
levels. It is also possible that each auxiliary device (such 
as B) feeds directly into M , rather than into another auxiliary 
device. 

By splitting main memory into several levels, we intend 
to model computer systems using large core storage in addition 
to the high speed execution store . In these systems, we would 
have n=2; for generality, we allow arbitrary n. The auxiliary 
devices may be drums, disks, tapes, etc. 

Each main level device M. has an access time a , representing 
the time required to reference one word in level M, . The access 
times satisfy 

a < a„ < < a 

We take the access time a of the fastest memory M to be one 

virtual time unit (vtu). 

Define T. . to be the traverse time from device i to device j. 

Here, 

j-l 

T ij = ^j T k,k+1 i < j 

k=i 

We assume T. . = T.., and that 
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Figure 8-6. Organization of multilevel memory. 
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T < -^ T <T < . . . < T 
12 ••• - n-l,n n,n+l n+m-l,n+m 

These traverse times include queue delays, mechanical position- 
ing times (if any), access times, and page transmission times. 
The traverse times to auxiliary devices usually depend on rot- 
ation times, and so queue delays and transmission times are 
usually negligible components in them. The traverse times be- 
tween main levels are composed mostly of transmission times, 
since access times are small. Typical traverse times, using 
1 vtu = 1 microsecond, are: 

type of device access time traverse time (page=lK words) 

thin film 0.1 vtu 100 vtu (0.1 ms . ) 

high speed core 1 vtu 1000 vtu (1 ms . ) 

slow speed core 8 vtu 8000 vtu (8 ms.) 

high speed drum 10 vtu 10 vtu (10 ms . ) 

q 5 

moving -arm disk 10" vtu 10 vtu (100 ms . ) 

When the traverse times depend on the rotation time of 
a device, we assume that shortest-access-time scheduling tech- 
niques, known to be optimum [C2,D3~], are used. We may thus 
assume that each such traverse time is as small as physically 
possible . 

The cost of storing one word for one unit of time is less 

at lower levels, H, beinq the most expensive and A least expen- 

' 1 m 

sive. The total storage capacity is assumed sufficient for 
system needs . 

The combined capacity of the main levels should certainly 
be sufficient to contain the balance set. But, in order that 
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lower traverse times may be effective, we strongly recommend 
that the main levels be sufficiently capacious that standby set 
jobs may also have their working sets present in the main levels. 
Thus, a job re-entering the balance set need not experience 
paging delays at the start of its quantum, and much higher 
processing efficiency is possible. 

We assume information moves upward only on demand, and 
downward as it falls out of use. 

We assume that the unit of storage in the main levels is 
the page, and the page size is the same in each main level. The 
unit of transfer between main levels is the page. We assume the 
unit of storage in the auxiliary levels is the segment. The unit 
of transfer between auxiliary levels, and between M n and h ± , is 
also the segment. Since information must reside in a main level 
to be addressable, a reference to information in an auxiliary 
level must always involve the transfer of a segment into M n 
before the reference can be completed. 

The basic strategy we adopt for managing multilevel memories 
is to place information at whatever level results in the least 
memory-usage cost (space-time product). 

There are three questions we must answer: 

1. How are the main levels to be managed? 

2. How are the auxiliary levels to be managed? 

3. What is the role of pre-paging? 

We shall use notions of locality and notions of cost to develop 
guidelines for strategies in each of these areas. 
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8.4.1. Managing the Main Levels 

In a multilevel memory, a working set allocation policy 
guarantees a computation the use of processors if and only if 
there is enough uncommitted space among the main levels to con- 
tain its working set. Thus, if W(t,x) is the working set of 
some computation, it resides somewhere in 

M x U M 2 U . . . U M 

We must refine the working-set definition in order to decide 
at which level each page of W(t,T) shall reside. 

It should be apparent that we should use a value for T that 
permits most of a program to reside in the main levels, because 
space is more abundant than in a single-main-level system, and 
because we want to assure higher processing efficiency. For 
example, given e, we can choose 1 such that the missing-page 
probability satisfies 

(8.4.1) \(t) = l-F x (T) < E 

where ^^(t) is the interref erence distribution. 

Let Z be a program. For each page i in Z we define the 
reference density p. (t,x) at time t to be 

number of references to i in (t-T,t) 



(8.4.2) p.(t,T) 
Then the working set of the computation using Z is 

(8.4.3) W(t,T) = {ieZ |p.(t,T) > o} 
Let e > e 1 >>" 5 e n be a set of thresholds , where 

(8.4.4) i = e„ > e, > ... > e =o 

o 1 n 
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Then we partition W(t,T) into n subsets, where 

(8.4.5) W k (t,T) = {isZ | © k _ 1 > p ± (t,T) > e k [ 

n 

(8.4.6) W(t,T) = (J W k (t,T) 

k=l 

v 

and W (t,x) is the set of pages to reside in level M. . 

This definition is based on a refinement of locality, the 
concept that, during any execution interval, a process favors 
some of its pages. The reference density p. (t,T) measures the 
degree to which page i is being favored. We assume that the set 
of favored pages (measured by W(t,x)) is not likely to change 
abruptly. In addition, we assume that the reference densities 
p.(t,T) are not likely to change abruptly. 

The capacity of each main level can be determined by 
suitable generalizations of the procedures already discussed 
in Section 8.1. 

The thresholds ©. represent tradeoffs between the cost of 
not having a page in level M, and running more slowly, versus 
having a page in level M, , running more quickly, and paying the 
overhead of moving the page. 

One method for setting the thresholds ©, is as follows. 
Let q be the average quantum, over all jobs, and let S be the 
page size. We wish to decide whether to move page i from level 
M. into M. , . If page i is moved, the saving in running time 
during q is : 

q(a k -a ]c _ 1 )p i (t,T) 

where a, is the access time to level M . The time required to 
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move the page is 

Sa, 
k 

since the page transfer proceeds at the slower of the two 
access times a, and a . The page should be moved if 

(8.4.7) q(a -a )p. (t,x) > Sa. 



That is , whenever 



Sa k 



(8.4.8) 



P± (t,T) > 6 k = - 



q(a k" a k-l ) 



Hardware not presently commercially available would be 



needed to implement automatic memory management using these 

"k 



ideas . Whenever the reference density of a page in level M, 



exceeds © k _i 5 the page is moved into M, _. . Whenever the re- 
ference density of a page in level M. falls below 9, , the 
page is a candidate for removal to M, ,. The least recently 
used non-working set pages in M are candidates for removal to A . 



For example, we could associate a T-bit shift register with 
each page-block of main memory. The bit pattern in the re- 
gister is shifted once every time unit. A 1 is entered into 
the register if the page is referenced, otherwise. The 
number of l's contained in the register can be used as a mea- 
sure of the reference density. 
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8.4.2. Managing the Auxiliary Levels 

Because the high access times make it expensive to re- 
ference information stored in auxiliary levels, we assume the 
the unit of information storage and transfer among the auxiliary 
levels is the segment. Moreover, an entire segment is moved 
from A, into M whenever one of its pages is referenced. 

The best strategy for managing auxiliary levels is the 
l eas t-recently-used strategy. As a segment falls out of use, 
it finds its way into the lowest levels. A segment is moved 
upward only when it is referenced. 

The reason this strategy is best follows from a locality 
concept, though not exactly the same concept we have been using 
for program behavior. The locality concept of interest here 
is locality in people's behavior and actions. The longer it has 
been since a person used a certain segment, the more likely it 
is that he has forgotten about it or that he no longer cares 
about it, and so the less likely it is that the segment is of 
immediate use to him. 
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8.4.3. What About Pre-Paqinq? 

When, if at all, is it worthwhile to load a job's infor- 
mation into main storage prior to its execution? 

The chief argument for pre-paging is as follows [LI]. 

Suppose it requires a traverse time T to acquire a page from a 

drum auxiliary memory, and that we wish to demand-page an n-page 

working set into memory. What is the space-time product ( cost ) 

of this operation? For k=l,...,n, paging in the k page results 

in k pages standing idle in memory, at a cost of kT. The total 

cost is 

n 



(8.4.9) > kT = n( ^ +1) T 



k=l 
On the other hand, by careful drum management, it is possible 
to write out the n-page working set as a contiguous block and 
read it back in as a contiguous block, the readin operation re- 
quiring about one traverse time T (since the page transmission 
times are so much less than the rotation time) . The cost of 
the operation is nT, since n pages of memory must be reserved 
before the paging operation can begin. Let C denote the ad- 
ditional cost of identifying working set pages (so that they can 
be paged out as a block) and carrying out the page-out operation. 
Then pre-paging is better if 

(8.4.10) nT + C < n(n - +1) T 

It is usually possible to make the cost C small enough to satisfy 
eq. 8.4.10. Apparently, then, pre-paging is worthwhile when 
used to obtain information from a rotating device. 
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If the information to be pre-paged resides in a large 
core storage, where the traverse time depends only on page trans- 
mission time, pre-paging is not worthwhile. With rotating de- 
vices, it takes about as much time to move a block of n pages 
as it does to move one page, whereas with non-rotating devices 
it takes one traverse time to move each page. There is clearly 
no gain from pre-paging information stored in a non-rotating 
device. 

Nevertheless, we do not believe pre-paging is worthwhile 
in the multiprocess computer systems we have described. The 
argument given to derive eq. 8.4.10 depended on there being 
no sharing of information. If a working set is to be paged 
out in a block, we must exclude the shared pages from this oper- 
ation. But then, when paging the working set back in, these 
shared pages may not be available, and additional effort is 
needed to locate them. The costs C (eq. 8.4.10) of identifying 
pages, of careful drum management, of handling the page-out 
operation, and of recovering the missing shared pages, can easily 
outweigh the potential savings. Other arguments agains pre- 
paging have been presented in Section 3.4. 

We do not, therefore, subscribe to pre-paging working set 
pages in multiprocess computer systems, unless no sharing is 
possible. Furthermore, if the multilevel memory system of 
Figure 8-6 is used, it is unlikely that a working set of a 
standby set computation will leave the main levels, so there is 
no need for pre-paging. 
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We do, however, feel that it is possible to anticipate 
that an item will be referenced even before it is in a working 
set, and begin moving it into higher levels beforehand. This 
requires a new concept of information structures, which we 
discuss in the next section. 
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8.5. The Environment Graph Information Structure 

Recent work by Dennis [D10] on the design of highly para- 
llel computer systems has produced interesting concepts which 
can greatly simplify the solution to the resource allocation 
problem. The most important concept is that of the environment 
graph information structure . 

-"A naming scheme is a set of rules for relating occurrences 
of identifiers to items represented in the computer's memory 
system. We assume here that the same naming scheme is used 
throughout the entire memory system from the lowest level to 
the highest, and throughout the entire execution of a process 
from its first reference to the last. This means that all 
references can be handled within the hardware, and in particular 
that levels of memory can communicate directly with one another 
without having to consult an operating system procedure . 

The environment graph is a generalization of the file dir- 
ectory structure [Dl] to a level of detail so fine that every, 
word has a named position. The environment graph is a directed 



Multics-like systems do not have a uniform naming scheme. All 
user information is embedded in a system-wide file directory 
structure [Dl] . A program makes its first reference to a seg- 
ment by means of a tree name , which may incur many costly refer- 
ences to little-used file directories stored in auxiliary 
levels befor&^the desired segment is located. Once located, 
a segment is assigned a segment number : subsequent references 
take place using segment numbers and are handled automatically 
by hardware. The dreadful inefficiency of referencing infor- 
mation buried in the file directory structure makes it neces- 
sary to have a second, more efficient naming scheme that 
streamlines later information references. 
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acyclic graph having a single root node from which there is at 
least one directed path to every other node in the structure. 
Every node has a label . Figure 8-7 shows an example of an (un- 
labelled) environment graph. All paths are assumed to be directed 
downward . 

If v, and v„ are nodes , and there is a directed path from 
v. to v_, then v_ is a descendent of v, . A subgraph is any node 
v together with all* its descendents; subgraphs represent data 
structures, such as files, arrays, procedures, etc. Figure 8-8 
shows how the linear sequence of instructions 

V = (v 1 ,v 2 ,... ,v n ) 

would be represented. The leaf nodes (those with no descendents) 
represent actual data values, whereas internal nodes represent 
named information structures (an internal node is always inter- 
preted as the root of a subgraph). 

A data value or structure is identified by selecting a path 
to it from the root node. A special data type, the pointer , may 
designate some internal node as being the most recent reference 
point of a process. The process makes new references with res- 
pect to its pointer, not with respect to the root. 

Define the k-orbit of a node v to be the set of nodes that 
are connected to v by a shortest undirected path of length k. 
The k-sphere of a node v is the set of j -orbits for Ki<k. If 
a process has its pointer at node v it will generally make its 
next reference to some node in the 1-sphere of v, its second 
reference to some node in the 2-sphere of v, and its k refer- 
ence to some node in the k-sphere of v. Therefore the environment 
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root- 




Figure 8-7. An environment graph, 




Figure 8-8. Representation of a linear sequence of words 
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graph can be used to anticipate references : if we observe the 
pointer at node v, we can say reliably that the next k refer- 
ences will occur within the k-sphere of v. 

Consider the multilevel memory system shown in Figure 8-9. 
Suppose a node v residing in level M k is referenced, requiring 
it to be moved into level M. . In anticipation of future refer- 
ences , v's 1-orbit should be moved into level M, , its 2-orbit 
into level M ? , and so on: when a node is moved upward k levels, 
its (n-k) -sphere is moved upward k levels. The k-orbit of any 
node v in level M, should not lie below level m in the memory 
system. 

Since the file directory structure used in many contemporary 
computer systems may be regarded as an environment graph whose 
nodes are segments (a node is a directory segment if and only 
if it is an internal node), a similar procedure might be used 
to anticipate segment references. Keep all of a segment on one 
level. If we observe a directory at level k is consulted, we 
bring it into level M, , all its contents to level M„, etc. 

By using the environment graph information structure to- 
gether with a uniform naming scheme and highly parallel auto- 
matic memory management hardware, these goals are met: 

1. There is sufficient detail in the environment graph 

to specify a process, so that little more than a pointer 
is needed to remember where the last reference took place. 
This eliminates complex auxiliary tables needed to spec- 
ify a computation, conserves memory, and permits rapid 
inter-process switching. 
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Figure 



Multilevel memory for use with environment graph. 
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2. Sharing is natural to implement. User A, whose local 
root node is v , can share subgraph v of user B simply 
by introducing the edge ( v , v ; into the environment 
graph (with B's permission). 

3. Protection is natural to implement. User A cannot 
reference any node v for which there is no path 
(v,,v ); and the path cannot be established with per- 
mission from user B. 

4. Locality is implied by the k-spheres. Given that a 

process has referenced node v, its next k references 

will generally fall within the k-sphere of v. In 

managing multilevel memories, the k-orbit of any node 

v in level M. should not lie below ievel M, . Working 
1 k 

set concepts can be used to decide when a node is to 
be moved downward. 
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8.6. Summary 

Programmers and system designers should keep in mind 
certain guidelines, where applicable: 

1. locality. 

2. programming generality. 

3. uniform naming schemes. 

4. pooling of eguipment at the finest level of detail. 

5. parallelism. 

6. ability to manipulate small quantities of information. 
The equipment configuration can be described analytically. 

Relations among program properties, processor -memory resources, 
and traverse times were derived. There is strong evidence 
favoring the use of large core storage at the upper levels of 
memory. 

In order to utilize equipment fully and to obtain the 
required capacity, it is necessary to pool small hardware units. 
If this is done successfully, it is possible to obtain many times 
the capacity with little more equipment than is currently used 
in computer systems. 

Management of multilevel memories can be handled using 
generalized working set concepts. The environment graph infor- 
mation structure provides a method for anticipating information 
references . 
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CHAPTER 9 



Performance Measures and Accounting Procedures 



9.0. Introduction 

Once we have accepted the working set model and the ideas 
of demand and balance as being valid and useful approaches to 
the resource allocation problem, the set of performance measures 
is more clearly defined. We shall review the relevant probability 
distributions and indicate how their measurement is useful, not 
only for proper regulation of the computer system, but also for 
assisting the administration in setting its operating policies. 
We shall complete the discussion begun in Chapter 1 regarding 
metering of resource usage and attributing of charges; of part- 
icular interest are methods for charging for shared information. 
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- 9.1. What to Measure and Why 

The measures fall into three classes, according to their 
purpose: 

1. Working set measures . The distribution of the page 
interreference intervals, the working set size distri- 
bution, and the autocorrelation function for working 
set size, are needed for the proper ( program -dependent) 
determination of the working set parameter T, and for 
better understanding of program behavior. 

2. System control measures . The joint demand distribution, 
the job running-time distribution, and the queue length 
distribution are needed to decide what equipment is 
needed and to arrive at a solution for the balance 
policy from the mathematical programming problem 
equations. Here, the efficiency, the missing-page 
probability, and the traverse time serve three purposes: 
first, to determine sensitivity to thrashing; second, 

to determine the equipment configuration; and third, 
to provide additional (non-program-dependent) criteria 
for selecting the working set parameter T. Finally, 
the variation of the balance set demand (p^nO about 

B B 

(a,B) is useful for deciding on the choices of the 
balance parameters a and B. 
3. Policy-determining measures . The queue-length dis- 
tribution (equivalently, the distribution of unser- 
viced demands in the standby set) provides indicators 
to the administration when user community demand is 
outstripping supply. The relationships among total 
community demand, bidding, and price, will have to be 
measured in order to be able to set prices. 
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We discuss each measure separately and indicate how it 
applies to each of these three categories. 

( 1) . Interreference Distribution F (u) . (cf . Section 4.1). The 

jr 

page interreference intervals x, so intimately related to work- 
ing set properties, have appeared again and again in our dis- 
cussions. Although we defined them in virtual time, because 
virtual time renders invisible the vagaries of paging and arbit- 
rary seguencing of scheduled jobs, we can also define them in 
real time and obtain directly the real-time working set pro- 
perties. In order that we can do this with the assurance that 
the derivations are correct, we must first convince ourselves 
that page waits and scheduling interrupts are distributed uni- 
formly among the jobs. Since working set memory management 
strategies assure statistical independence among jobs, and since 
the scheduler is assumed fair, we may be assured of non-distorted 
measures . 

( 2) . Wor king set size distribution F (u) . (cf . Section 4.4) . 
=■ —to 

Measurements of individual working set sizes are needed to ob- 
tain more insights into the behavior of programs, answering ques- 
tions such as: How strong is locality across the range of pro- 
gram types?, How does u)(t,T) vary across the execution of a pro- 
gram?, How successful can programmers who attempt to design 
programs with small, compact working sets be? 

(3). Correl ation func tion R (u, t) . (cf. Section 4.8). The 
ur _ =i j 

correlation between working set size at two times is invaluable 
not only for examing locality, but also for assisting in the 
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proper choice of t and evaluating the predictive ability of a 
measurement of the working set size. To determine R (u,t), pro- 
ceed as follows. Let {_t j _- be a long sequence of N equally- 
spaced instants at which the size u)(t,f) is sampled, and let 
o).=u)(t.,T) denote the size at time t.. Then the value of R (u,t) 



at time spacing u. is: 



N-i 



V u i' T) = N=T E "k "k+i 
k=0 

Two things must be noted: the number N of samples must be large 
so that i may become large enough, with N»i , to make the sam- 
ples oo, and to . statistically independent; and R (u,T) depends 
on T, and so it will have to be measured for a family of T-values . 

(4). Working Set Intersections . (cf. Section 3.1.3). A study 
of the size of the intersection between the working sets W(t,f) 
and W(t+y,T) of a certain process, as a function of y, would 
provide insight into the predictive ability of working sets. 
Also of interest is the effect of an interaction during (t,t+y) 
on the intersection, as a function of the duration of the 
interaction. 

( 5) . The running-time distribution F (u) . (cf . Section 6.2.2). 
In the case of single-process computations, this distribution 
is useful for determining processor demand. This distribution 
may not be particularly valuable in the case of multiprocess 
computations, in which we are more interested in the number, 
rather than the duration, of component processes. Moreover, 
since q is defined to be interval between successive interactions, 
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the distribution F (u) tells how often a process will be 

q 

blocked. 



(6). Joint demand distribution P (u,v) . (cf. Section 7.4). 
— pm 

Knowledge of this distribution is needed to obtain a solution 
(a balance policy) from the mathematical programming problem 
described in Sections 7.3 and 7.4. Assuming a fair balance 
policy, F (u,v) is easily measured by taking samples of the 
jobs in the standby set queues. Knowledge of this distribution 
is also invaluable for assisting the administration in setting 
prices and deciding when to purchase equipment. If it is ob- 
served that either of Pr[p=l] or Pr[m=l] is not small, then 
either price controls must be enforced to reduce demand or 
more equipment must be purchased. 

(7). Queue length distributions F u , n . ( v lu, v) . This gives the 
length n of the queue at the point (u,v) in the standby set 
demand space, Section 7.4. This is again useful for finding 
the optimum balance policy and for indicating to the adminis- 
tration when the total demand is high enough to warrant new 
equipment. 

(8). Duty factor ri ( t) . (cf. Sections 4.5 and 5.6). Defined 
as the fraction of time in the balance set a process is not in 
page wait, the duty factor is useful for determining sensitivity 
to thrashing (Section 3.6) and for determining the equipment 
configuration (Section 8.2) and for estimating processing effi- 
ciency. 
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(9). Missing-page probability \(t) . (cf. Sections 4.2 and 5.4). 
Useful for determining sensitivity to thrashing, and for estim- 
ating paging rates. It can be measured in a time interval I as 

number of page faults in I 
\(T) = 

number of references in I 

(10). Traverse time T. It is useful to know how often references 
are made to slower) lower levels of memory, for purposes of de- 
termining sensitivity to thrashing and memory system requirements, 

(11). Variation of balance set demand . For a given balance 
policy, we can perform experiments to observe the variation of 
balance set demand (p„,m ) about the desired (a,B). Doing this 
for a family of (oc,B) values will yield information useful for 
determination of (oc,B). 

(12). Demand vs . cost curves . (cf. Section 1.4.1). The steady 
state curves discussed in Chapter 1 relating cost per unit re- 
source to total community demand would be valuable for assisting 
administration officials set prices. These curves can be com- 
posed from the joint demand distribution F (u,v) resulting from 
particular price settings. The administration may have to ex- 
periment with prices in order to determine the general character 
of the curve. 

(13). Bidding and inflation . (cf. Section 1.4.3). Assuming the 
existence of a bidding mechanism, it is necessary to know 
whether inflation is a problem. 
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9.2. Charging for Resource Use 

Given that memory management at all levels of the memory 
hierarchy is controlled by means of working set or related stra- 
tegies, we observe that there may be no need to explicitly bill 
for processor usage, because a process receives service from 
a processor if and only if its working set is in main memory. 
We merely charge an account for the size and duration of its 
main memory usage; in so doing we implicitly obtain processor 
usage. 

Thus, let cu (t) denote the number of pages in main memory 
P 

at time t, belonging to process p. If u> (t)=0 we understand 
that p has no pages in main memory (i.e., it is neither running 
nor page wait). The cost C (I) to process p during a real time 
interval I for main memory usage is 

(9.2.1) C (I) = c /(jj(t)dt some c_>0 
P o j P ° 

and C (I) implies both processor and memory usage. 

We do not mean to imply that processor usage ought not be 
metered. We only mean to point out that the same mechanism 
that meters memory usage can be used to infer processor usage 
costs . 

When there is sharing, we follow the ideas of Section 5.1, 
letting a page in main memory belong to the working set of the 
process that most recently referenced it. In this case u> (t) 
still measures the number of pages belonging to process p at 
time t, and the cost is still given by eg. 9.2.1. The problem 
of attributing pages to processes is an implementation problem, 
and has already been discussed in Section 5.1. 



275 



Eg. 9.2.1 can be extended easily to memory usage costs 
in mu 1 ti level in cm or i. cm. , where now an owner (Section 5.1) ha 
pages or s gnents 
the number of pages hold by owner j at level M 7 , and suppose 



ored at various levels. Let co J ( t ) denote 



c is the cost per unit t'rne to store one page at level M, 
Then, daring an interva ! I, owner j is charged 



2.2! 



C . ( T ) 
j 



k-l 



c, to, (t) dt 



; ome c >0 
o 



for his resource usage. 
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CHAPTER 10 



Conclusions 



By constructing abstract behavior models for ongoing 
multiprocess computations, we have intended to build a frame- 
work within which we can understand misunderstood problems, 
answer unanswered questions, and foresee unforeseen difficulties, 

Perhaps more important than the particular models is the 
basic approach. Every one of the models is based on an appro- 
priate locality concept. 

For a variety of reasons it is natural to suppose that, 
during any interval of execution, the majority of programs 
will favor a subset of their information, exhibiting locality 
in their reference patterns. 

A process's working set of information — the pages it has 
referenced during the last t units of execution — is a measure 
of the set of favored pages. Main memory allocation strategies 
that grant processors to processes if and only if their working 
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sets are present in main memory can minimize both memory 
usage costs and the possibility of thrashing. By defining a 
page's reference density — the fraction of the last T refer- 
ences it received — we can refine the notion of a working set 
for memory systems having several levels of directly-addressable 
memory. Pages with the highest reference densities reside in 
the highest levels, and pages with the lowest reference den- 
sities reside in the lowest levels. 

The locality concept behind these working set models 
assumes that a process is unlikely to abruptly change either 
its favored pages or its reference densities. 

It is guite clear that resource allocation can be very 
effective if programs do in fact exhibit the locality properties 
we assume. Indeed, the more pronounced the locality behavior, 
the more successful the resource allocation. Because the con- 
cept of a working set is defined independently of a computer 
system, it is perfectly reasonable to encourage programmers to 
construct their programs to have small, compact working sets. 
There is no need to resort to absurdities, like a declare 
working set statement in PL/I; all that is necessary is that 
a programmer get organized, avoid unnecessary jumping from 
region to region in name space, and employ algorithms and data 
structures that induce highly local reference patterns. 

The definition of system demand is another application of 
locality concepts, for we assume that it is possible to measure, 
and act on, a computation's demand before the demand can change 
significantly. 
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The definition of system demand can be extended from 
two resource types — processor and memory — to n resource 
types. One simply defines an n-tuple, whose i position 
contains a measure of demand for the i resource type. Demand 
for resource types beyond processor and memory can be defined 
once the appropriate locality concept has been recognized. 

Thus, by first asking ourselves the question: What is the 
locality concept applicable here?, we have been able to cons- 
truct useful models for program behavior. We suspect that this 
sort of approach to constructing behavior models may be useful 
in other areas as well. 

The model of a balanced computer system has given insights 
into the causes of thrashing, into the equipment configuration 
problem, into means of satisfying other scheduling objectives 
beyond balance, and into methods of analysis. 

When the computer system is continuously balanced, the 
demand of the balance set is tightly distributed about the de- 
sired demand. Although we cannot accurately predict the demand 
of an arbitrarily given computation, we can accurately predict 
the demand of the balance set. For this reason it is possible 
not only to avoid thrashing, but also to effect the proper 
equipment configuration and be confident that it is correctly 
matched to the work load. 

Balance policies are flexible. By formulating a mathe- 
matical programming problem whose objective function is arbit- 
rary, whose constraints enforce both balance and fairness, and 
whose solution is the set of jobs to be admitted to the balance 
set, we showed that it is possible to establish reasonable 
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policies with respect to other, arbitrarily given criteria 
(such as minimum response time). 

The model of a balanced computer system has shown that 
analysis is possible because computations can be made independent 
of one another, inasmuch as resource acquisitions of one compu- 
tation do not interfere with resources in use by another. This 
model has also shown that processor and memory demands cannot 
be treated independently: resource allocation decisions must 
account for both demands at the same time. 

The model of a balanced computer system has many applic- 
ations to contemporary and future problems of computer system 
organization. This model gives quantitative justification to 
many intuitive ideas; for example, the intuitive notion of a 
working set, or the benefits obtainable by sharing information 
and pooling equipment, or the dependence of thrashing on memory 
traverse times. This model affords possible solutions to prob- 
lems for which we have no previous answers, such as the equip- 
ment configuration problem or the thrashing problem. This 
model makes clear which program behavior parameters are impor- 
tant, and what performance measures ought to be used. The model 
suggests better system organizations, better resource allocation 
policies. The model can make system designers and administrators 
feel confident that there is theoretical justification to their 
decisions. Finally, the model has shown that we are only start- 
ing on the long road to understanding the complex behavior of 
computations and other information-processing activities. 

If we have answered some questions, we have raised others. 
Many of these have already been indicated throughout the text. 
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The most troublesome problems arise when information is 
shared. In our work here, we have made processes statistically 
independent, an assumption that is valid only if processes do 
not communicate or if shared data is not interlocked. Clearly, 
many interesting questions concern non-independent processes. 
It is evident that we want two processes to run concurrently 
whenever they share information. Ideally, we want to use 
scheduling mechanisms and policies that somehow automatically 
group processes together, according as they share information. 
More work is needed in this area. 

We defined a computation to be a collection of mutually 
cooperating processes and information operating in the same name 
space, so that a computation is behaviorally well-defined. Might 
a vaguer definition lead to even more useful models? Can we de- 
fine degrees of cooperation among processes and let the member- 
ship of a computation vary dynamically, according to degrees 
of cooperation? More work is needed in this area. 

Another direction the work can be extended is into the 
so-called distributed data problem. What locality and working 
set concepts are important when the data is geographically scat- 
tered, as might be the case in a computer network? Is there any 
way to anticipate, on the basis of present or past behavior, 
when information should be moved from one geographic location 

to another? 

We have intended to devise new approaches to modelling com- 
putations, to spark a new kind of thinking about dynamic infor- 
mation processing activities, and to develop new philosophies 
about resource sharing and allocation. We sincerely hope we 
have raised more questions than we have answered. 



281 



BIBLIOGRAPHY 



The following abbreviations are used: 

CACM Communications of the ACM 

JACM Journal of the ACM 

IEEETEC IEEE Transactions on Electronic Computers 



Al. Arden, B. w. , et_ al_. Program and Address Structure in a 
Time Sharing Environment. JACM 13 , 1 (Jan. 1966), 1-16. 

Bl. Belady, L. A. A Study of Replacement Algorithms for a 

Virtual Storage Computer. IBM Systems Journal J5, 2, 1966, 
78-101. 

B2. Belady, L. A. Biased Replacement Algorithms for Multi- 
programming. IBM Thomas Watson Research Center Research 
Note NC697 (21 March 1967). 

CI. Coffman, E. G. Stochastic Models of Multiple and Time- 
Shared Computer Operations. UCLA Report No. 66-38 (Ph.D. 
Thesis ) . 

C2. Coffman, E. G. Analysis of a Drum Input/Output Queue under 
Scheduled Operation in a Paged Computer System. (to be 
published. ) 

C3. Coffman, E. G. , and Kleinrock, L. Computer Scheduling 

Methods and their Countermeasures . AFIPS Conf . Proc. 32. 
(1968 SJCC) . 

C4. Coffman, E. G. , and Wood, R. C. Interarrival Statistics 

for Time Sharing Systems. CACM 9, 7 (July 1966), 500-503. 

C5. Comeau, L. A Study of the Effect of User Program Optimi- 
zation in a Paging System. ACM Symp . on Op . Svs . Princ. 
(Gatlinburg, Tenn. , Oct 1967). 

C6. Corbato, P. J. , et al_. An Experimental Time Sharing System. 
AFIPS Conf . Proc . 21 (1962 SJCC), 335-344. 

C7. Corbato, P. J. System Requirements for a Multiple-Access, 
Time-Shared Computer. Project MAC Report MAC-TR-3 (1964). 

C8. Corbato, F. J., and Vyssotsky, V. A. Introduction and 
Overview of the Multics System. AFIPS Conf . Proc . 27 
(FJCC 1965) 

DO. Dantzig, G. B. Linear Programming and Extensions . 
Princeton University Press (1963), p. 517ff. 



282 



Dl. Daley, R. C. , and Neuman, P. G. A General Purpose File 

System for Secondary Storage. AFIPS Conf . Proc. 2J7 (1965 
FJCC). 

D2. Denning, P. J. Memory Allocation in Multiprogrammed Computers. 
M.I.T. Project MAC Computation Structures Group Memo No. 24 
(March 1966). 

D3. Denning, P. J. Effects of Scheduling on File Memory Oper- 
ations. AFIPS Conf . Proc . 30 (1967 SJCC) , 9-21. 

D4. Denning, P. J. The Working Set Model for Program Behavior. 
CACM 11 . 5 (May 1968). 

D5. Denning, P. J. A Statistical Model for Console Behavior 
in Multiuser Computers. (to appear in CACM , 1968). 

D6. Denning, P. J. Thrashing: Its Causes and Prevention, 
(to be published). 

D7. Dennis, J. B. Program Structure in a Multi-Access Computer. 
M.I.T. Project MAC Report MAC-TR-11. 

D8. Dennis, J. B. Segmentation and the Design of Multiprogrammed 
Computer Systems. JACM 12 , 4 (Oct 1964), 589-602. 

D9. Dennis, J. B. , and Van Horn, E. C. Programming Semantics 
for Multiprogrammed Computations. CACM 9, 3 (March 1966), 
143-155. 

D10. Dennis, J. B. Programming Generality, Parallelism and 
Computer Architecture. IFIPS Conf . Proc . (1968). 

Dll. Dijkstra, E. W. The Structure of THE-Multiprogrammed System. 
CACM 11 , 5 (May 1968). 

El. Estrin, G. , and Kleinrock, L. Measures, Models and Measure- 
ments for time-shared Computer Utilities. Proc . ACM Nat'l 
Conf . (1967), 85-96. 

Fl. Fano, R. M. , and David, E. E. On the Social Implications 

of Accessible Computing. AFIPS Conf . Proc . 27 (1965 FJCC), 
243-247. 

F2. Feller, W. Introduction to Probability Theory and its Ap- 
plications.. (Vol. I, 1950; Vol. II, 1966). New York: Wiley. 

F3. Fife, D. W. , and Smith, J. L. Transmission Capacity of Disk 
Storage Systems with Concurrent Arm Positioning. IEEETEC 
EC-14 , 4 (August 1965). 

F4. Fife, D. W. An Optimization Model for Time-Sharing. AFIPS 
Conf . Proc . 28 (1966 SJCC). 

F5. Fikes, R. E. , et al. Steps Toward a General Purpose Time 

Sharing System Using Large Capacity Core Storage and TSS/360. 
Carnegie-Mellon University Technical Report (1968). Also, 
Proc. 23 Nat'l Conf . ACM ( 1968) . 



283 



F6. Fine, G. H, et_ al_. Dynamic Program Behavior under Paging. 
Proc . 21 Nat'l Conf . ACM (1966). 

Kl. Kilburn, T. , et al. One-Level Storage System. IRE Trans . 
on Elec . Comp . EC-11 , 2 (April 1962). 

K2. Kleinrock, L. Optimum Bribing for Queue Position. To 
appear in Journal of Operations Research . 

LI. Lauer, H. A. Bulk Core in a 360/67 Time Sharing System. 
Computer Design 7, 4 (April 1968), 94-101. 

L2. Luconi , F. Asynchronous Computation Structures. M.I.T. 
Ph.D. Thesis (Dept. of E.E.), January, 1968. 

Ml. McKellar, A., and Coffman, E.G. The Organization of 

Matrices and Matrix Operations in a Paged Multiprogramming 
Evnironment. Princeton University Report TR-59 (Feb. 1968). 

Nl. Nielsen, N. R. The Analysis of General Purpose Computer 
Time Sharing Systems. Stanford University School of Bus. 
Adm. Ph.D. Thesis, 1967. 

N2. Nielsen, N. R. The Simulation of Time Sharing Systems. 
CACM 10, 7 (July 1967). 

01. O'Neill, R. W. Experience Using a Time Sharing Multipro- 
gramming System with Dynamic Address Relocation Hardware. 
AFIPS Conf . Proc . 30 (1967 SJCC), 611-621. 

02. Oppenheimer, G. , and Weizer, N. Resource Management for a 
Medium Scale Time-Sharing Operating System. CACM 11 , 5 
(May, 1968). 

PI. Papoulis, A. Probability , Random Variables , and Stochastic 
Processes . New York: McGraw-Hill (1965). 

P2. Parkhill, D. The Challenge of the Computer Utility . 
Addison-Wesley (1966). 

P3. Parzen, E. Stochastic Processes . San Francisco: Holden- 
Day (1962). 

P4. Progress Report III . M.I.T. Project MAC (1965-1966), 63-66. 

Rl. Ramamoorthy, C. V. The Analytic Design of a Dynamic Look- 
Ahead and Program Segmenting System for Multiprogrammed 
Computers. Proc . 21 Nat'l Conf . ACM (1966). 

R2. Randell, B. , and Kuehner, C. Dynamic Storage Allocation 
Systems. CACM 11 , 5 (May, 1968). 

R3. Rosen, S. ( ed . ) . Programming Systems and Languages . New 
York: McGraw-Hill (1967), p. 598. 

SI. Saaty, T. L. Elements of Queueing Theory . New York: 
McGraw-Hill (1961). 



284 



52. Saltzer, J. H. Traffic Control in a Multiplexed Computer 
System. M.I.T. Project MAC Report MAC-TR-30 (Ph.D. Thesis), 
July, 1966. 

53. Scherr, A. L. An Analysis of Time Shared Computer Systems. 
M.I.T. Project MAC Report MAC-TR-18 (Ph. D. Thesis), June 1965. 



S4. 



Selwyn, L. L. The Information Utility. Industrial Manage - 
ment Review , 2i 2 (Spring 1966). 



55. Shemer, J., and Shippey, G. Statistical Analysis of Paged 

and Segmented Computer Systems. IEEETEC EC-15 , 6 (Dec. 1966). 

56. Slotnick, D. Achieving Large Computing Capabilities Through 
an Array Computer. AFIPS Conf . Proc. 30 (1967 SJCC), 477-482. 

57. Smith, J. L. Multiprogramming under a Page on Demand Stra- 
tegy. CACM 10 , 10 (Oct. 1967), 636-646. 

VI. Varian, L, , and Coffman, E. G. An Empirical Study of the 
Behavior of Programs in a Paging Environment. CACM 1 J , 5 
(May 1968) . 

V2. Vyssotsky, V. A., e_t a_l. Structure of the Multics Super- 
visor. AFIPS Conf . Proc . 27 (1965 FJCC). 

Wl. Wegner, P. Programming Languages , Information Structures , 
and Machine Organization . New York: McGraw-Hill (1968). 

W2. Wozencraft, J. M. , and Jacobs, I. M. Principles of Com- 
munications Engineering . New York: Wiley (1965). 



285 



BIOGRAPHIC NOTE 



Peter James Denning was born January 6, 1942, in New York 
City. He resided in New York City until 1945, then in Darien, 
Connecticut, until 1960. He graduated (1960) from Fairfield 
College Preparatory School, Fairfield, Connecticut; from Man- 
hattan College (1964), New York City, with a. Bachelor of Elec- 
trical Engineering Degree; from M.I.T., Cambridge, Massachusetts, 
with a Master of Science (1965) and with a Doctor of Philosophy 
(1968). During his four years at M.I.T., he has been associated 
with Project MAC. For the first two of these four years he held 
a National Science Foundation Fellowship; for the third year he 
held a National Science Foundation Traineeship; and for the last 
year he worked as a Research Assistant. 

During his summer, Mr. Denning has worked for Bell Labor- 
atories (1963), IBM Corporation (1964)', Project MAC (1965 and 
1967), and Aerospace Corporation (1966). His three-semester ex- 
perience teaching M.I.T. undergraduate subject, Theoretical Models 
for Computation, developed a strong desire to teach, and so he 
has joined the faculty of Princeton University as Assistant Pro- 
fessor of Electrical Engineering. 

Mr. Denning is a member of Tau Beta Pi, Eta Kappa Nu, Sigma 
Xi , and the Association for Computing Machinery. 

Mr. Denning married the former Anne DeMarco, of New York 
City, in August, 1964; his first child, Anne Catherine, was 
born in February, 1968. 

Mr. Denning has the following publications: 

1. Queueing Models for File Memory Operation. 

(S.M. Thesis). Project MAC report MAC-TR-21 (Oct. 1965). 

2. Protected Service Routines and Inters phere Communication. 
Project MAC Computation Structures Group Memo No. 20 
(Feb. 1966). 

3. Memory Allocation in Multiprogrammed Computer Systems. 
Project MAC Computation Structures Group Memo No. 24 
(March 1966). 

4. Effects of Scheduling on File Memory Operations. 
AFIPS Conf . Proc. 30. (SJCC 1967), 9-21. 

5. The Working Set Model for Program Behavior. 
Comm . ACM 11, 5 (May 1968). 

6. A Statistical Model for Console Behavior in Multiuser 
Computers. (To appear in Comm . ACM during 1968). 

7. Thrashing: Its Causes and Prevention. 
(To be published in 1968). 

8. Machines , Languages , and Computation . Textbook co- 
authored with Jack B. Dennis, to be published by Pren- 
tice-Hall, Inc. 



, ^" 



=1 



n 



UNCLASSIFIED 



Security Classification 



DOCUMENT CONTROL DATA - R&D 

(Security claeeitication o/ title, body ot abatract and indexing annoiafion oniii 6* entered when the overall report ia ctaaaitied) 



ORIGINATING ACTIVITY (Corporate author) 

Massachusetts Institute of Technology 
Project MAC 



2a. REPORT SECURITY CLASSIFICATION 

UNCLASSIFIED 



26. GROUP 



None 



3. REPORT TITLE 



Resource Allocation in Multiprocess Computer Systems 



4. DESCRIPTIVE NOTES (Type ot report and incluelve datea) 

Ph.D. Thesis, Department of Electrical Engineering, May 1968 



5. AUTHOR(S) (Laat name, liret name, initial) 

Denning, Peter James 



6. REPORT DATE 

May 1968 



Sa. CONTRACT OR GRANT NO. 

Office of Naval Research, Nonr-4102(01) 

b. PROJECT NO. 

NR, 048-189 
*' RR 003-09-01 

d. 



7a. TOTAL NO. OF PAGES 

297 



76. NO. OF REFS 

57 



9a. ORIGINATOR'S REPORT NUMBERISJ 



MAC-TR-50 (THESIS) 



96. OTHER REPORT NO(S> (Any other number e that may be 
aaaigned thia report) 



10. AVAILABILITY/LIMITATION NOTICES 

Distribution of this document is unlimited. 



II. SUPPLEMENTARY NOTES 



None 



SPONSORING MILITARY ACTIVITY 

Advanced Research Projects Agency 

3D-200 Pentagon 

Washington, D. C. 20301 



13. ABSTRACT 

The dynamic allocation of limited processor and main memory resources among 
members of a user community is investigated as a supply-and -demand problem. The 
work is divided in four phases. First is the construction of the working set model 
for program behavior based on locality; a computation's working set is a dynamic 
measure of this set of favored information. The second phase is the definition and 
study of properties of system demand. A computation is the basic demand-making en- 
tity, placing demands jointly on processor and main memory resources. Its system 
demand is a pair (processor demand, memory demand). The third phase is the defini- 
tion and study of the properties of system balance. Computations that demand 
resources are segregated into two classes — the standby set, which is temporarily 
denied the use of system resources, and the balance set, which is granted the use 
of system resources. The system is balanced when the total system demand matches 
the system capacity. The fourth phase is to apply all these ideas to the design 
and administration of multiprocess computer systems. 



14. KEY WORDS 



Computers 

Resource allocation 

Multiprocess computers 



Machine-aided cognition 
Multiple-access computers 
On-line computers 



Real-time computers 
Time -sharing 
Time-shared computers 



DD 



FORM 

I NOV 41 



1473 (M.I.T.) 



UNCLASSIFIED 



Security Classification 



